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Treponema pallidum Polynucleotides and Sequences 



FIELD OF THE INVENTION 



The present invention relates to the field of molecular biology. In particular, it relates to. 



S" among other things, nucleotide sequences of Treponema pallidum^ contigs, ORFs, fragments, 
probes, primers and related polynucleotides thereof, peptides and polypeptides encoded by the 
sequences, and uses of the polynucleotides and sequences thereof, such as in fermentation, 
polypeptide production, assays and pharmaceutical development, among others. 



Spirochetes are a family of motile, unicellular, spiral-shaped bacteria which share a 
number of structural characteristics. Three genera of the spirochetes are pathogenic in humans: 
(a) Treponema^ which includes the pathogens that cause syphilis (71 pallidum)^ yaws (71 

IS pertenue)^ and pinta (7. carateum)\ (b) Borrelia, which includes the pathogens that cause 

epidemic and endemic relapsing fever and Lyme disease; and (c) Leptospira, which includes a 
wide variety of small spirochetes that cause mild to serious systemic human illness (Koff, A. B. 
and Rosen. T. 7. Am, Acad, Dermatol 29:519-535 (1993)). In 1986, more than 27,000 cases 
of early infectious syphilis were diagnosed in the United States alone. Such statistics indicate 

20 that infection with 7. pallidum is the largest source of human disease resulting from the 
spirochetes. 

7. pallidum is morphologically indistinguishable from several other pathogenic 
spirochetes, but, in general, treponemes and other spirochetes, are easily identifiable when 
compared to other bacteria. A key morphological characteristic of 7. pallidum^ and other 

25 spirochetes, is the presence of a central protoplasmic cylinder composed primarily of 

peptidoglycan and one or more adjacent axial fibrils (also designated periplasmic flagella or 
endoflagella; Charon, N. W., et al. Res, Microbiol 143:597-603 (1992)). These structures 
provide a source of corkscrew-like motion to the treponemes. In aqueous media, treponemes 
move in an apparently random fashion and, unlike the majority of motile bacteria, continue to 

30 move in a more viscous medium. In tissues, treponemes are highly moldable to intercellular 
spaces; a characteristic which is thought to be mediated by the interactions of bacterial adhesins 
and cellular fibronectins. 

Syphilis is the primary clinical manifestation of infection with T, pallidum. The clinical 
manifestations of syphilis can resemble many diseases. Syphilis is typically transmitted by 

35 sexual contact, but can also be transmitted transplacentally. The infecting organism multiplies at 
the site of infection within 10 to 60 days, postinfection and results in a primary ulcer-like lesion 
termed a chancre. A small number of organisms move from the primary lesion to the regional 
lymph nodes and establish small infectious centers termed satellite buboes. Organisms from 



10 
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these locations enter the blood stream and result in a systemic infection (Goens, J. L., et al. Am. 
Fam. Physician 50: 1 0 1 3- 1 020 ( 1 994)). 

The secondary stage of syphilis manifests itself as a widespread skin rash and begins 
between two and twelve weeks following the primary infection. During this stage, the infected 
5 individual often experiences a low grade fever coupled with swollen lymph nodes. Also during 
this period, lesions of various degrees of severity may develop in a number of phyical locations 
including bone, liver, kidney, central nervous system (CNS), and other organs (Veeravahu, M. 
Arch. Intern. Med. 145:132-134 (1985)). Such secondary infections are highly infectious, but 
will, in time, subside spontaneously. 

10 A third stage of syphilis occurs in approximately 30% of infected, but not treated, 

individuals. The third stage occurs several years following the first and second stages. The 
lesions which characterize the third stage of infection are minor in terms of the number of 
organisms, but may be severe in terms of tissue damage. Such lesions may result in necrosis, 
scar formation, general paresis, damage to aortic valves, permanent blindness, and other 

1 5 extensive tissue damage, all probably related to a delayed type hypersensitivity maction by the 
host to the T. pallidum organisms (Scheck, D. N. and Hook, E. W. 3"* Infect Dis. Clin. North 
Am. 8:769-795 (1994)). 

A further, and increasingly common, complication of syphilis infection is coinfection 
with the human inmiunodeficiency virus (HIV). In fact, a recent study indicates that ulcerous 

20 genital diseases such as those exhibited during the primaiy stages of infection with syphilis may 
facilitate the transmission of HIV (Rufli, T. Dermatologica 179: 113-117 (1989)). In addition, it 
is clear that the CNS is regularly involved in the early stages of syphilis. In the timespan 
between the introduction of penecillin and other antibotics and the spread of HIV, early 
neurosyphilis was an exceptionally uncommon development. However, since the standard 

25 antibiotic dosage used to treat syphilis is not exceptionally high and since a successful treatment 
requires an adequate host immune response, individuals infected with HFV often exhibit a highly 
increased occurance of many neurosyphilis-related sequaiae including asymptomatic 
neurosyphilis, syphilitic meinigitis, cranial nerve abnormalities, or cerebrovascular problems 
(Musher, D. M., et al, Ann. Intern. Med. 113:872-881 (1990)). 

30 T. pallidum has a remarkable ability to evade both the humoral and cellular components of 

the immune system. It was originally thought that the ability of T. pallidum to evade the immune 
system of the host organism was due to the presence of an outer coat of mucopolysaccharides. 
However, recent evidence suggests it is more likely that T. pallidum make use of the organization 
of the relative immunogenicity of its complement of outer membrane proteins to evade the 

35 immune system (Radolf. J. D. MoL Microbiol 16: 1067-1073 (1995)). Unlike most other 
bacterial outer membranes characterized thus far, the T. pallidum outer membrane contains a 
scarcity of immunogenic transmembrane proteins (with regard to T. pallidum^ these are termed 
"rare outer membrane proteins"). Among the highly inmiunogenic proteins of treponemes are a 
number of lipoproteins anchored to the periplasmic leaflet of the cytoplasmic membrane. As a 
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result of their physical location, the lipoproteins may be less susceptible to typical immunologic 
surveillance (Norris, J. Microbiol, Rev, 57:750-779 (1993)). In addition to the periplasmic 
lipoproteins, T. pallidum also secretes a number of small, but immunogenic proteins which may 
induce an immune response (Hindersson, P. etai. Res, Microbiol, 143:629-639 (1992)). 
5 It is clear that the etiology of diseases mediated or exacerbated by T, pallidum genes, and 

that characterizing the genes and their patterns of expression would add dramatically to our 
understanding of the organism and its host interactions. Knowledge of T. pallidum genes and 
genomic organization would dramatically improve understanding of disease etiology and lead to 
improved and new ways of preventing, ameliorating, arresting and reversing diseases. 
10 Moreover, characterized genes and genomic fragments of T, pallidum would provide reagents 
for, among other things, detecting, characterizing and controlling T, pallidum infections. There 
is a need therefore to characterize the genome of T, pallidum and for polynucleotides and 
sequences of this organism. 

15 SUMMARY OF THE INV ENTION 

The present invention is based on the sequencing of fragments of the T, pallidum 
genome. The primary nucleotide sequences which were generated are provided in SEQ ID 
NOS: 1-744. 

The present invention provides the nucleotide sequence of several thousand contigs of the 
20 T, pallidum genome, which are listed in tables below and set out in the Sequence Listing 

submitted herewith, and representative fragments thereof, in a form which can be readily used; 
analyzed, and interpreted by a skilled artisan. In one embodiment, the present invention is 
provided as contiguous strings of primary sequence information corresponding to the nucleotide 
sequences depicted in SEQ ID NOS: 1-744. 
25 The present invention further provides nucleotide sequences which are at least 95% 

identical to the nucleotide sequences of SEQ ID NOS: 1-744. 

The nucleotide sequence of SEQ ID NOS: 1-744 , a representative fragment thereof, or a 
nucleotide sequence which is at least 95% identical to die nucleotide sequence of SEQ ID NOS: 
1-744 may be provided in a variety of mediums to facilitate its use. In one application of this 
30 embodiment, the sequences of the present invention are recorded on computer readable media. 
Such media includes, but is not limited to: magnetic storage media, such as floppy discs, hard 
disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical 
storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical 
storage media. 

35 The present invention further provides systems, particularly computer-based systems 

which contain the sequence information herein described stored in a data storage means. Such 
systems are designed to identify commercially important fragments of the T, pallidum genome. 

Another embodiment of the present invention is directed to fragments of the T, pallidum 
genome having particular structural or functional attributes. Such firagments of the T, pallidum 
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genome of the present invention include, but are not limited to. fragments which encode peptides, 
hereinafter referred to as open reading frames or ORFs, fragments which modulate the 
expression of an operably linked ORF, hereinafter referred to as expression modulating 
fragments or EMFs, and fragments which can be used to diagnose the presence of T. pallidum 
5 in a sample, hereinafter referred to as diagnostic fragments or DFs. 

Each of the ORFs in fragments of the T. pallidum genome disclosed in Tables 1, 2 and 3, 
and the EMFs found 5' to the ORFs, can be used in numerous ways ais polynucleotide reagents. 
For instance, the sequences can be used as diagnostic probes or amplification primers for 
detecting or determining the presence of a specific microbe in a sample, to selectively control 
10 gene expression in a host and in the production of polypeptides, such as polypeptides encoded by 
ORFs of the present invention, particular those polypeptides that have a pharmacological activity. 

The present invention further includes recombinant constructs comprising one or more 
fragments of the T, pallidum genome of the present invention. The recombinant constructs of the 
present invention comprise vectors, such as a plasmid or viral vector, into which a fragment of 
15 the 7. pallidum has been inserted. 

The present invention further provides host cells containing any of the isolated fragments 
of the 71 pallidum genome of the present invention. The host cells can be a higher eukaryotic 
host cell, such as a mammalian cell, a lower eukaryotic cell, such as a yeast cell, or a procaryotic 
cell such as a bacterial cell. 
20 The present invention is further directed to isolated polypeptides and proteins encoded by 

ORFs of the present invention. A variety of methods, well known to those of skill in the art, 
routinely may be utilized to obtain any of the polypeptides and proteins of the present invention. 
For instance, polypeptides and proteins of the present invention having relatively short, simple 
amino acid sequences readily can be synthesized using commercially available automated peptide 
25 synthesizers. Polypeptides and proteins of the present invention also may be purified from 
bacterial ceUs which naturally produce the protein. . Yet another alternative is to purify 
polypeptide and proteins of the present invention firom cells which have been altered to express 
them. 

The invention further provides methods of obtaining homologs of the fragments of the 7. 
30 pallidum genome of the present invention and homologs of the proteins encoded by the ORFs of 
the present invention. Specifically, by using the nucleotide and amino acid sequences disclosed 
herein as a probe or as primers, and techniques such as PGR cloning and colony/plaque 
hybridization, one skilled in the art can obtain homologs. 

The invention further provides antibodies which selectively bind polypeptides and 
35 proteins of the present invention. Such antibodies include both monoclonal and polyclonal 
antibodies. 

The invention fiirther provides hybridomas which produce the above-described 
antibodies. A hybridoma is an immortalized cell line which is capable of secreting a specific 
monoclonal antibody. 
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The present invention further provides methods of identifying test samples derived from 
cells which express one of the ORFs of the present invention, or a homolog thereof. Such 
methods comprise incubating a test sample with one or more of the antibodies of the present 
invention, or one or more of the DFs of the present invention, under conditions which allow a 
5 skilled artisan to determine if the sample contains the ORF or product produced therefrom. 

In another embodiment of the present invention, kits are provided which contain the . 
necessary reagents to carry out the above-described assays. 

Specifically, the invention provides a compartmentalized kit to receive, in close 
confinement, one or more containers which comprises: (a) a first container comprising one of the 
10 antibodies, or one of the DFs of the present invention; and (b) one or more other containers 

comprising one or more of the following: wash reagents, reagents capable of detecting presence 
of bound antibodies or hybridized DFs. 

Using the isolated proteins of the present invention, the present invention further provides 
methods of obtaining and identifying agents capable of binding to a polypeptide or protein 
15 encoded by one of the ORFs of the present invention. Specifically, such agents include, as 

further described below, antibodies, peptides, caibohydrates, pharmaceutical agents and the like. 
Such methods comprise steps of: (a)contacting an agent with an isolated protein encoded by one 
of the ORFs of the present invention; and (b)detennining whether the agent binds to said protein. 

The present genomic sequences of 71 pallidum will be of great value to all laboratories 
. 20 working with this organism and for a variety of conmiercial purposes. Many fragments of the T, 
pallidum genome will be immediately identified by similarity searches against GenBank or 
protein databases and will be of immediate value to T. pallidum researchers and for inmiediate 
commercial value for the production of proteins or to control gene expression. 

The methodology and technology for elucidating extensive genomic sequences of 
25 bacterial and other genomes has and will greatly enhance the ability to analyze and understand 
chromosomal organization. In particular, sequenced contigs and genomes will provide the 
models for developing tools for the analysis of chromosome structure and function, including the 
ability to identify genes within large segments of genomic DNA, the structure, position, and 
spacing of regulatory elements, the identification of genes with potential industrial applications, 
30 and the ability to do comparative genomic and molecular phylogeny . 

DESCRIPTION OF THE FIGIJRKS 

FIGURE 1 is a block diagram of a computer system (102) that can be used to 
35 implement computer-based systems of present invention. 

FIGURE 2 is a schematic diagram depicting the data flow and computer programs used 
to collect, assemble, edit and annotate the contigs of the T, pallidum genome of the present 
invention: Both Macintosh and Unix platforms are used to handle the AB 373 and 377 sequence 
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data files, largely as described in Kerlavage et aL, Proceedings of the Twenty-Sixth Annual 
Hawaii International Conference on System Sciences, 585, IEEE Computer Society Press, 
Washington D.C. (1993). Factura (AB) is a Macintosh program designed for automatic vector 
sequence removal and end-trimming of sequence files. The program Loadis runs on a Macintosh 
5 platform and parses the feature data extracted finom the sequence files by Factura to the Unix 
based 7. pallidum relational database. Assembly of contigs (and whole genome sequences) is 
accomplished by retrieving a specific set of sequence files and their associated features using 
Extrseq, a Unix utility for retrieving sequences from an SQL database. The resulting sequence 
file is processed to trim portions of the sequences with a high rate ambiguous nucleotides. The 

10 sequence files were assembled using TIGR Assembler, an assembly engine designed at The 
Institute for Genomic Research (TIGR ) for rapid and accurate assembly of thousands of 
sequence fragments. The collection of contigs generated by the assembly step is loaded into the 
database with the lassie program. Identification of open reading frames (ORFs) is accomplished 
by processing contigs with zorf. The ORFs are searched against T. pallidum sequences from 

1 5 GenBank and against ail protein sequences using the BLASTN and BLASTP programs (using 
default parameters), described in Altschul et al.^ 7. MoL Biol 215: 403-410 (1990). Results of 
the ORF determination and similarity searching steps were loaded into the database. As 
described below, some results of the determination and the searches are set out in Tables 1-3. 



20 DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

The present invention is based on the sequencing of fragments of the T, palUdum genome 
and analysis of the sequences. The primary nucleotide sequences generated by sequencing the 
fragments are provided in SEQ ID NOS: 1-744. As used herein, the "primary sequence" refers 
to the nucleotide sequence represented by the lUPAC nomenclature systenL). 

25 In addition to the aforementioned T. pallidum polynucleotide and polynucleotide 

sequences, the present invention provides the nucleotide sequences of SEQ ID NOS: 1-744, ORF 
IDs and ORFs within, or representative fragments thereof, in a form which can be readily used, 
analyzed, and interpreted by a skilled artisan. 

As used herein, a "representative fragment of the nucleotide sequence depicted in SEQ ID 

30 NOS: 1-744" refers to any portion of the SEQ ID NOS: 1 -744 which is not presently represented 
within a publicly available database. Preferred representative fragments of the present invention 
are T, pallidum open reading frames ( ORFs ), expression modulating fragment ( EMFs ) and 
fragments which can be used to diagnose the presence of T. pallidum in sample (DFs). A non- 
limiting identification of preferred representative fragments is provided in Tables 1-3. As 

35 discussed in detail below, the information provided in SEQ ED NOS: 1-744 and in Tables 1-3 

together with routine cloning, synthesis, sequencing and assay methods will enable those skilled 
in the art to clone and sequence aU "representative fragments" of interest, including open reading 
frames encoding a large variety of T. pallidum proteins. 
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The present invention is further directed to nucleic acid molecules encoding portions or 
fragments of the nucleotide sequences described herein. Fragments include portions of the 
nucleotide sequences of SEQ ID NOS:l-744, at least 10 contiguous nucleotides in length selected 
from any two integers, one of which representing a 5' nucleotide position and a second of which 
5 representing a 3' nucleotide position, where the first nucleotide for each nucleotide sequence in 
SEQ ID NOS: 1-744 is position 1 . That is, every combination of a 5" and 3' nucleotide position 
that a fragment at least 10 contiguous nucleotides in length could occupy is included in the 
invention. At least means a fragment may be 10 contiguous nucleotide bases in length or any 
integer between 10 and the length of an entire nucleotide sequence of SEQ ID NOS: 1-744 minus 
10 1. Therefore, included in the invention are contiguous fragments sp)ecified by any 5* and 3' 
nucleotide base positions of a nucleotide sequences of SEQ ID NOS: 1*744 wherein the 
contiguous fragment is any integer between 10 and the length of an entire nucleotide sequence 
minus 1. 

Further, the invention includes polynucleotides comprising fragments specified by size, 
15 in nucleotides, rather than by nucleotide positions. The invention includes any fragment size, in 
contiguous nucleotides, selected from integers between 10 and the length of an entire ORF ID, 
ORF. or SEQ ID NO:, minus 1 . Preferred sizes of contiguous nucleotide fragments include 20 
nucleotides, 30 nucleotides, 40 nucleotides, 50 nucleotides. Other preferred sizes of contiguous 
nucleotide fragments, which may be useful as diagnostic probes and primers, include fragments 
20 50-300 nucleotides in length which include, as discussed above, fragment sizes representing each 
integer between 50-300. Larger fragments are also useful according to the present invention 
corresponding to most, if not all, of the nucleotide sequences shown in Tables 1-3 (ORF IDs) 
and SEQ ID NOS: 1 -744. The preferred sizes are, of course, meant to exemplify not limit the 
present invention as all size fragments, representing any integer between 10 and the length of an 
25 entire nucleotide sequence minus 1, of each ORF ID, ORF, and SEQ ID NO:, are included in the 
invention. 

The present invention also provides for the exclusion of any fragment, specified by 5* 
and y base positions or by size in nucleotide bases as described above for any ORF ED or SEQ 
ID NOS:l-744. Any number of fragments of nucleotide sequences in ORF IDs or SEQ ID 

30 NOS: 1-744, specified by 5* and 3' base positions or by size in nucleotides, as described above, 
may be excluded frx>m the present invention. 

While the presently disclosed sequences of SEQ ID NOS: 1-744 are highly accurate, 
sequencing techniques are not perfect and. in relatively rare instances, further investigation of a 
fragment or sequence of the invention may reveal a nucleotide sequence error present in a 

35 nucleotide sequence disclosed in SEQ ID NOS: 1-744. However, once the present invention is 
made available once the infoimation in SEQ ID NOS: 1-744 and Tables 1-3 has been made 
available), resolving a rare sequencing error in SEQ ID NOS: 1-744 will be well within the skill 
of the art. The present disclosure makes available sufficient sequence information to allow any of 
the described contigs or portions thereof to be obtained readily by straightforward application of 
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routine techniques. Further sequencing of such polynucleotide may proceed in like manner using 
manual and automated sequencing methods which are employed ubiquitous in the art. Nucleotide 
sequence editing software is publicly available. For example. Applied Biosystem's (AB) 
AutoAssembler can be used as an aid during visual inspection of nucleotide sequences. By 
5 employing such routine techniques potential errors readily may be identified and the correct 

sequence then may be ascertained by targeting further sequencing effort, also of a routine nature, 
to the region containing the potential error. 

Even if all of the very rare sequencing errors in SEQ ID NOS: 1-744 were corrected, the ' 
resulting nucleotide sequences would still be at least 95% identical, nearly all would be at least 

10 99% identical, and the great majority would be at least 99.9% identical to the nucleotide 
sequences of SEQ ID NOS: 1-744 1-744. 

As discussed elsewhere herein, polynucleotides of the present invention readily may be 
obtained by routine application of well known and standard procedures for cloning and 
sequencing DNA. Detailed methods for obtaining libraries and for sequencing are provided 

IS below, for instance. A wide variety of T. pallidum strains can be used to prepare 7. pallidum 
genomic DNA for cloning and for obtaining polynucleotides of the present invention which are 
known in th art. 

The nucleotide sequences of the genomes from different strains of T. pallidum differ 
somewhat. However, the nucleotide sequences of the genomes of all T, pallidum strains will be 

20 at least 95% identical, in corresponding part, to the nucleotide sequences provided in SEQ ID 

NOS: 1 -744 and tiie ORF IDs and ORFs witiiin. Nearly aU will be at least 99% identical and the 
great majority will be 99.9% identical. 

The present application is further directed to nucleic acid molecules at least 90%, 95%, 
96%, 97%, 98% or 99% identical to a nucleic acid sequence shown in SEQ ID NOS: 1-744, the 

25 ORF IDs and ORFs within. The above nucleic acid sequences are included irrespective of 
whether they encode a polypeptide having T. pallidum activity. This is because even where a 
particular nucleic acid molecule does not encode a polypeptide having T. pallidum activity, one of 
skill in the art would still know how to use the nucleic acid molecule, for instance, as a 
hybridization probe. Uses of the nucleic acid molecules of the present invention that do not 

30 encode a polypeptide having T. pallidum activity include, inter alia, isolating an 71 pallidum gene 
or allelic variants thereof from a DNA library, and detecting T pallidum mRNA expr^sion 
samples, envirorunental samples, suspected of containing 7. pallidum by Northern Blot, PGR, or 
similar analysis. 

Preferred, are nucleic acid molecules having sequences at least 90%, 95%, 96%, 97%, 
35 98% or 99% identical to the nucleic acid sequence shown in SEQ ID NOS: 1-744. the ORF IDs, 
and the ORF within each ORF ID, which do, in fact, encode a polypeptide having 7. pallidum 
protein activity By "a polypeptide having 7. pallidum activity" is intended polypeptides 
exhibiting activity similar, but not necessarily identical, to an activity of the 7. pallidum protein of 
the invention, as measured in a particular biological assay suitable for measuring activity of the 
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specified protein. 

Due to the degeneracy of the genetic code, one of ordinary*skiil in the art will immediately 
recognize that a large number of the nucleic acid molecules having a sequence at least 90%, 95%, 
96%, 97%, 98%. or 99% identical to the nucleic acid sequences shown in SEQ ID NOS: 1-744, 
5 the ORF IDs, and the ORF within each ORF ID, will encode a polypeptide having 7. pallidum 
protein activity. In fact, since degenerate variants of these nucleotide sequences all encode the 
same polypeptide, this will be clear to the skilled artisan even without performing the above 
described comparison assay. It will be further recognized in the art that, for such nucleic acid 
molecules that are not degenerate variants, a reasonable number will also encode a polypeptide 
10 having 7. pallidum protein activity. This is because the skilled artisan is fully aware of amino 
acid substitutions that are either less likely or not likely to significantly effect protein function 
(e.g., replacing one aliphatic amino acid with a second aliphatic amino acid), as further described 
below. 

The biological activity or function of the polypeptides of the present invention are 
1 5 expected to be similar or identical to polypeptides from other bacteria that share a high degree of 
structural identity/similarity. Table 1-3 lists accession numbers and descriptions for the closest 
matching sequences of polypeptides available through Genbank. It is therefore expected that the 
biological activity or function of the polypeptides of the present invention will be similar or 
identical to those polypeptides from other bacterial genuses, species, or strains listed in Table 1- 
20 3. 

By a polynucleotide having a nucleotide sequence at least, for example, 95% "identical" 
to a reference nucleotide sequence of the present invention, it is intended that the nucleotide 
sequence of the polynucleotide is identical to the reference sequence except that the 
polynucleotide sequence may include up to five point mutations per each 100 nucleotides of the 

25 reference nucleotide sequence encoding the T. pallidum polypeptide. In other words, to obtain a 
polynucleotide having a nucleotide sequence at least 95% identical to a reference nucleotide 
sequence, up to 5% of the nucleotides in the reference sequence may be deleted, inserted, or 
substituted with another nucleotide. The query sequence may be an entire sequence shown in 
SEQ ID NOS: 1-744, the ORF IDs, or the ORF within each ORF ID, or any fragment specified 

30 as described herein. 

As a practical matter, whether any particular nucleic acid molecule or polypeptide is at 
least 90%. 95%. 96%, 97%, 98% or 99% identical to a nucleotide sequence of the presence 
invention can be determined conventionally using known computer programs. A preferred 
method for determining the best overall match between a query sequence (a sequence of the 

35 present invention) and a subject sequence, also referred to as a global sequence alignment, can be 
determined using the FASTDB computer program based on the algorithm of Brutlag et al. See 
Brutlag et al. (1990) Comp. App. Biosci. 6:237-245. In a sequence alignment the query and 
subject sequences are both DNA sequences. An RNA sequence can be compared by first 
converting U's to T's. The result of said global sequence alignment is in percent identity. 



Printed from Mimosa 02/03/22 07:17:46 Page: 11 



wo 98/59034 




>8/13041 



Preferred parameters used in a FASTDB alignment of DNA sequences to calculate percent 
identity are: Matrix=Unitary, k-tuple=4. Mismatch Penal ty=l. Joining Penalty=30, 
Randomization Group Length=0, Cutoff Score=l, Gap Penalty=5, Gap Size Penalty 0.05, 
Window Size=500 or the lenght of the subject nucleotide sequence, whichever is shorter. 
5 If the subject sequence is shorter than the query sequence because of 5' or 3* deletions, 

not because of internal deletions, a manual correction must be made to the results. This is 
because the FASTDB program does not account for 5* and 3' truncations of the subject sequence 
when calculating percent identity. For subject sequences truncated at the 5* or 3' ends, relative to 
the query sequence, the percent identity is corrected by calculating the number of bases of the 

10 query sequence that are 5' and 3' of the subject sequence, which are not matched/aligned, as a 
percent of the total bases of the query sequence. Whether a nucleotide is matched/aligned is 
determined by results of the FASTDB sequence alignment. This percentage is then subtracted 
from the percent identity, calculated by the above FASTDB program using the specified 
parameters, to arrive at a fmal percent identity score. This corrected score is what is used for the 

15 purposes of the present invention. Only nucleotides outside the 5' and 3* nucleotides of the 
subject sequence, as displayed by the FASTDB alignment, which are not matched/aligned with 
the query sequence, are calculated for the puiposes of manuaUy adjusting the percent identity 
score. 

For example, a 90 nucleotide subject sequence is aligned to a 100 nucleotide query sequence to 
20 determine percent identity. The deletions occur at the 5* end of the subject sequence and 

therefore, the FASTDB alignment does not show a matched/alignment of the first 10 nucleotides 
at 5' end. The 10 unpaired nucleotides represent 10% of the sequence (number of nucleotides at 
the 5' and 3* ends not matched/total number of nucleotides in the query sequence) so 10% is 
subtracted from the percent identity score calculated by the FASTDB program. If the remaining 
25 90 nucleotides were perfectly matched the final percent identity would be 90%. In another 

example, a 90 nucleotide subject sequence is compared with a 100 nucleotide query sequence. 
This time the deletions are internal deletions so that there are no nucleotides on the 5' or 3' of the 
subject sequence which are not matched/aligned with the query. In this case the percent identity 
calculated by FASTDB is not manually corrected. Once again, only nucleotides 5* and 3' of the 
30 subject sequence which are not matched/aligned with the query sequence are manually corrected 
for. No other manual corrections are to made for the purposes of the present invention. 

COMPUTER RELATED EMBODIMENTS 

The nucleotide sequences provided in SEQ ID NOS: 1-744, including ORF IDs and 
35 corresponding ORFs, a representative fragment thereof, or a nucleotide sequence at least 95%, 
preferably at least 99% and most preferably at least 99.9% identical to said polynucleotide 
sequences may be "provided" in a variety of mediums to facilitate use thereof. As used herein, 
"provided" refers to a manufacture, other than an isolated nucleic acid molecule, which contains a 
nucleotide sequence of the present invention. Such a manufacture provides a large portion of the 
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r. pallidum genome and parts thereof (e.^., a T, pallidum open reading frame (ORF)) in a form 
which allows a skilled artisan to examine the manufacture using means not directly applicable to 
examining the T, pallidum genome or a subset thereof as it exists in nature or in purified form. 

In one application of this embodiment, a nucleotide sequence of the present invention can 
5 be recorded on computer readable media. As used herein, "computer readable media" refers to 
any medium which can be read and accessed directly by a computer. Such media include, but are 
not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and 
magnetic tape; optical storage media such as CI>- ROM; electrical storage media such as RAM 
and ROM; and hybrids of these categories, such as magnetic/optical storage media. A skilled 

10 artisan can readily appreciate how any of the presendy known computer readable mediums can be 
used to create a manufacture comprising computer readable medium having recorded thereon a 
nucleotide sequence of the present invention. Likewise, it will be clear to those of skill how 
additional computer readable media that may be developed also can be used to create analogous 
manufactures having recorded thereon a nucleotide sequence of the present invention, 

15 As used herein, "recorded" refers to a process for storing information on computer 

readable medium. A skilled artisan can readily adopt any of the presently know methods for 
recording information on computer readable medium to generate manufactures comprising the 
nucleotide sequence information of the present invention. 

A variety of data storage structures are available to a skilled artisan for creating a 

20 computer readable medium having recorded thereon a nucleotide sequence of the present 

invention. The choice of the data storage structure will generally be based on the means chosen 
to access the stored information. In addition, a variety of data processor programs and formats 
can be used to store the nucleotide sequence information of the present invention on computer 
readable medium. The sequence information can be represented in a word processing text file, 

25 formatted in conmiercially- available software such as WordPerfect and MicroSoft Word, or 

represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase, 
Oracle, or the like. A skilled artisan can readily adapt any number of data-processor structuring 
formats {e.g., text file or database) in order to obtain computer readable medium having recorded 
thereon the nucleotide sequence information of the present invention. 

30 Computer software is publicly available which allows a skilled artisan to access sequence 

information provided in a computer readable mediimi. Thus, by providing in computer readable 
form the nucleotide sequences of SEQ ID NOS: 1-744, including ORF IDs and corresponding 
ORFs, a representative fragment thereof, or a nucleotide sequence at least 95%, preferably at 
least 99% and most preferably at least 99.9% identical to said polynucleotide sequences, the 

35 present invration enables the skilled artisan routinely to access the provided sequence information 
for a wide variety of purposes. 

The examples which follow demonstrate how software which implements the BLAST 
(Altschul et al, J. Mol BioL 275:403-410 (1990)) and BLAZE (Brutiag et al, Comp. Chem. 
77:203-207 (1993)) search algorithms on a Sybase system was used to identify open reading 
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frames (ORFs) within the T. pallidum genome which contain homology to ORFs or proteins 
from both T. pallidum and from other organisms. Among the ORFs discussed herein are protein 
encoding fragments of the T. pallidum genome useful in producing commercially important 
proteins, such as enzymes used in fermentation reactions and in the production of commercially 
5 useful metabolites. 

The present invention further provides systems, particularly computer-based systems, 
which contain the sequence information described herein. Such systems are designed to identify, 
among other things, commercially important fragments of the T, pallidum genome. 



10 means, and data storage means used to analyze the nucleotide sequence information of the present 
invention. The minimum hardware means of the computer-based systems of the present 
invention comprises a central processing unit (CPU), input means, output means, and data 
storage means. A skilled artisan can readily appreciate that any one of the cuirently available 
computer-based system are suitable for use in the present invention. 

15 As stated above, the computer-based systems of the present invention comprise a data 

storage means having stored tl^rein a nucleotide sequence of the present invention and the 
necessary hardware means and software means for supporting and implementing a search means. 

As used herein, "data storage means" refers to memory which can store nucleotide 
sequence information of the present invention, or a memory access means which can access 

20 manufactures having recorded thereon the nucleotide sequence information of the present 
invention. 



sequence information stored within the data storage means. Search means are used to identify 
25 fragments or regions of the present genomic sequences which match a particular target sequence 
or target motif. A variety of known algorithms are disclosed publicly and a variety of 
commercially available software for conducting search means are and can be used in the 
computer-based systems of the present invention. Examples of such software includes, but is 
not limited to, MacPattem (EMBL), BLASTN and BLASTX (NCBIA). A skilled artisan can 
30 readily recognize that any one of the available algorithms or implementing software packages for 
conducting homology searches can be adapted for use in the present computer-based systems. 

As used herein, a "target sequence" can be any DNA or amino acid sequence of six or 
more nucleotides or two or more amino acids. A skilled artisan can readily recognize that the 
longer a target sequence is, the less likely a target sequence will be present as a random 
35 occurrence in the database. The most preferred sequence length of a target sequence is bom 
about 10 to 100 amino acids or from about 30 to 300 nucleotide residues. However, it is well 
recognized that searches for commercially important fragments, such as sequence fragments 
involved in gene expression and protein processing, may be of shorter length. 



As used herein, "a computer-based system" refers to the hardware means, software 



As used herein, "search means" refers to one or more programs which are implemented 
on the computer- based system to compare a target sequence or target structural motif with the 
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As used herein, "a target structural motif," or "target motif," refers to any rationally 
selected sequence or combination of sequences in which the sequence(s) are chosen based on a 
three-dimensional configuration which is formed upon the folding of the target motif. There arc a 
variety of target motifs known in the art. Protein target motifs include, but are not limited to, 
5 enzymic active sites and signal sequences. Nucleic acid target motifs include, but are not limited 
to, promoter sequences, hairpin structures and inducible expression elements (protein binding 
sequences). 

A variety of structural formats for the input and output means can be used to input and 
output the information in the computer-based systems of the present invention. A preferred 

10 format for an output means ranks fragments of the T. pallidum genomic sequences possessing 
varying degrees of homology to the target sequence or target motif. Such presentation provides a 
skilled artisan with a ranking of sequences which contain various amounts of the target sequence 
or target motif and identifies the degree of homology contained in the identified fragment. 

A variety of comparing means can he used to compare a target sequence or target motif 

15 with the data storage means to identify sequence fragments of the T. pallidum genome. In the 
present examples, implementing software which implement the BLAST and BLAZE algorithms, 
described in Altschul et al, J. MoL BioL 215: 403-410 (1990), is used to identify open reading 
frames within the T. pallidum genome. A skilled artisan can readily recognize that any one of the 
publicly available homology search programs can be used as the search means for the computer- 

20 based systems of the present invention. Of course, suitable proprietary systems that may be 
known to those of skill also may be employed in this regard. 

Figure 1 provides a block diagram of a computer system illustrative of embodiments of 
this aspect of present invention. The computer system 102 includes a processor 106 connected to 
a bus 104. Also connected to the bus 104 are a main memory 108 (preferably implemented as 

25 random access memory, RAM) and a variety of secondary storage devices 110, such as a hard 
drive 1 12 and a removable medium storage device 1 14. The removable medium storage device 
1 14 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic tape drive, 
etc, A removable storage medium 1 1 6 (such as a floppy disk, a compact disk, a magnetic tape. 
etc.) containing control logic and/or data recorded therein may be inserted into the removable 

30 medium storage device 1 14. The computer system 102 includes appropriate software for reading 
the control logic and/or the data from the removable medium storage device 1 14, once it is 
inserted into the removable medium storage device 1 14. 

A nucleotide sequence of the present invention may be stored in a well known manner in 
the main memory 108, any of the secondary storage devices 110, and/or a removable storage 

35 medium 116. During execution, software for accessing and processing the genomic sequence 
(such as search tools, comparing tools, etc) reside in main memory 108, in accordance with the 
requirements and operating parameters of the operating system, the hardware system and the 
software program or programs. 
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BIOCHEMICAL EMBODIMENTS 

Other embodiments of the present invention are directed to isolated fragments of the T, 
pallidum genome. The fragments of the 7. pallidum genome of the present invention include, but 
5 are not limited to fragments which encode peptides* hereinafter open reading frames (ORFs), 
fragments which modulate the expression of an operably linked ORF, hereinafter expression 
modulating fragments (EMFs) and fragments which can be used to diagnose the presence of T. 
pallidum in a sample, hereinafter diagnostic fragments (DFs). 

As used herein, an "isolated nucleic acid molecule" or an "isolated fragment of the 7. 

10 pallidum genome" refers to a nucleic acid molecule possessing a specific nucleotide sequence 
which has been subjected to purification means to reduce, from the composition, the number of 
compounds which are normally associated with the composition. Particularly, the term refers to 
the nucleic acid molecules having the sequences set out in SEQ ID NOS: 1-744, to representative 
fragments thereof as described above including ORF IDs and ORFs, to polynucleotides at least 

15 95%, preferably at least 96%, 97%, 98%, or 99% and especially preferably at least 99.9% 
identical in sequence thereto, also as set out above. 

A variety of purification means can be used to generate the isolated fragments of the 
present invention. These include, but are not limited to methods which separate constituents of a 
solution based on charge, solubility, or size. 

20 In one embodiment, 7. pallidum DNA can be enzymatically sheared to produce fragments 

of 1 5-20 kb in length. These fragments can then be used to generate a 7. pallidum libraxy by 
inserting them into lambda clones as described in the Examples below. Primers flanking, for 
example, an ORF, such as those enumerated in the ORF IDs of Tables 1-3, can then be generated 
using nucleotide sequence information provided in SEQ ID NOS: 1-744. Well known and 

25 routine techniques of PGR cloning then can be used to isolate the ORF from the lambda DNA 
library or 7. pallidum genomic DNA. Thus, given the availability of SEQ ID NOS: 1-744, the 
information in Tables 1, 2 and 3, and the information that may be obtained readily by analysis of 
the sequences of SEQ ID NOS: 1-744 using methods set out above, those of skill will be enabled 
by the present disclosure to isolate any ORF-containing or other nucleic acid fragment of the 

30 present invention. 

The isolated nucleic acid molecules of the present invention include, but are not limited to 
single stranded and double stranded DNA, and single stranded RNA. For purposes of 
numbering and reference to polynucleotide and polypeptide sequences the entire sequence of each 
sequence of SEQ ID NOS: 1-744 is included with the first nucleotide being position 1. 

35 Hierefore, for reference purposes the numbering used in the present invention is that provided in 
the sequence listing for SEQ ID NOS: 1-744. 

As used herein, an open reading frame (ORF), means a series of nucleotide triplets 
coding for amino acid residues without any termination codons and is a sequence translatable into 
protein. Further* unless specified, the term "ORF' for each ORF ID is defined by the termination 
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codon at the 3' end and the 5* most methionine codon, at the 5* end, in frame with said 3' 
teimination codon. Unless specified, the term "ORF* also refers to a particular polypeptide 
sequence defined by the ORF polynucleotide sequence, wherein the N-terminus is defined by the 
5* most methionine codon in frame with the termination codon at the 3' end of the ORF ID and 
the C-terminus is defined by the last codon before the said 3' termination codon. As used herein, 
an ORF ID represents a sequence without any internal termination codons flanked by termination 
codons. 

Tables 1, 2, and 3 list ORF IDs in the T, pallidum genomic contigs of the present 
invention that were identified as putative coding regions by the GeneMark software using 
organism-specific second-order Maikov probability transition matrices. It will be appreciated that 
other criteria can be used, in accordance with well known analytical methods, such as those 
discussed herein, to generate more inclusive, more restrictive, or more selective lists. 

Table 1 sets out ORF IDs in the T. pallidum contigs of the present invention that over a 
continuous region of at least 50 bases are 95% or more identical (by BLAST analysis) to a 
nucleotide sequence available through GenBank in June, 1997. 

Table 2 sets out ORF IDs in the T, pallidum contigs of the present invention that are not 
in Table 1 and match, with a BLASTP probability score of 0.01 or less, a polypeptide sequence 
available through GenBank in July, 1996. 

Table 3 sets out ORF IDs in the T, pallidum contigs of the present invention that do not 
match significantly, by BLASTP analysis, a polypeptide sequence available through GenBank in 
July, 1996, 

In each table, the first and second columns identify the ORF ID by, respectively, condg 
number and ORF ID number within the contig; the third column indicates the first nucleotide of 
the ORF ID, counting from the 5' end of the contig strand; and the fourth column indicates the 
last nucleotide of the ORF ID, counting from the 5' end of the contig strand. 

In Tables 1 and 2, column six, lists the Reference for the closest matching isequence 
available through GenBank. These reference numbers are the databases entry numbers 
commonly used by those of skill in the art, who will be familiar with their denominators. 
Descriptions of the nomenclature are available from the National Center for Biotechnology 
Information. Column seven in Tables 1 and 2 provides the gene name of the matching 
sequence; column eight provides the BLAST identity score from the comparison of the ORF and 
the homologous gene; and column nine indicates the length in nucleotides of the highest scoring 
segment pair identified by the BLAST identity analysis. 

In Table 3, the last column, column six, indicates the length of each ORF ID in amino 
acid residues. 

The concepts of percent identity and percent similarity of two polypeptide sequences is 
well understood in the art. For example, two polypeptides 10 amino acids in length which differ 
at three amino acid positions {e.g,, at positions 1, 3 and 5) are said to have a percent identity of 
70%. However, the same two polypeptides would be deemed to have a percent similarity of 
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80% if, for example at position 5, the amino acids moieties, although not identical, were 
"similar" (z.e., possessed similar biochemical characteristics). Many programs for analysis of 
nucleotide or amino acid sequence similarity, such as FASTA and BLAST specifically list percent 
identity of a matching region as an output parameter. Thus, for instance. Tables 1 and 2 herein 
5 enumerate the percent identity of the highest scoring segment pair in each ORF and its listed 
relative. Further details concerning the algorithms and criteria used for homology searches are 
provided below and are described in the pertinent literature highlighted by the citations provided 
below. 

It will be appreciated that other criteria can be used to generate more inclusive and more 

10 exclusive listings of the types set out in the tables. As those of skill will appreciate, narrow and 
broad searches both are useful. Thus, a skilled artisan can readily identify ORFs in contigs of the 
7. pallidum genome other than those specified for Tables 1-3, such as ORFs which are 
overiapping or encoded by the opposite strand of an identified ORF in addition to those 
ascertainable using the computer-based systems of the present invention. 

15 As used herein, an "expression modulating fragment," EMF, means a series of nucleotide 

molecules which modulates the expression of an operably linked ORF or EMF. 

As used herein, a sequence is said to "modulate the expression of an operably linked 
sequence" when the expression of the sequence is altered by the presence of the EMF. EMFs 
include, but are not limited to, promoters, and promoter modulating sequences (inducible 

20 elements). One class of EMFs are fragments which induce the expression or an operably linked 
ORF in response to a specific regulatory factor or physiological event. 

EMF sequences can be identified within the contigs of the T. pallidum genome by their 
proximity to the ORF IDs provided in Tables 1-3 and ORFs within each ORF ID. An intergenic 
segment, or a fragment of the intergenic segment, from about 10 to 200 nucleotides in length, 

25 taken from any one of the ORFs of Tables 1-3 will modulate the expression of an operably linked 
ORF in a fashion similar to that found with the naturally linked ORF sequence. As used herein, 
an "intergenic segment" refers to fragments of the T. pallidum genome which are between two 
ORF(s) herein described. EMFs also can be identified using known EMFs as a target sequence 
or target motif in the computer-based systems of the present invention. Further, the two methods 

30 can be combined and used together. 

The presence and activity of an EMF can be confirmed using an EMF trap vector. An 
EMF trap vector contains a cloning site linked to a marker sequence. A marker sequence encodes 
an identifiable phenotype, such as antibiotic resistance or a complementing nutrition auxotrophic 
factor, which can be identified or assayed when the EMF trap vector is placed within an 

35 appropriate host under appropriate conditions. As described above, a EMF will modulate the 

expression of an operably linked marker sequence. A more detailed discussion of various marker 
sequences is provided below. A sequence which is suspected as being an EMF is cloned in all 
three reading frames in one or more restriction sites upstream from the marker sequence in the 
EMF trap vector. The vector is then transformed into an appropriate host using known 
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procedures and the phenotype of the transformed host in examined under appropriate conditions. 
As described above, an EMF will modulate the expression of an operably linked marker 
sequence. 

As used herein, a "diagnostic fragment." DF, means a series of nucleotide molecules 
5 which selectively hybridize to T. pallidum sequences. DFs can be readily identified by 

identifying unique sequences within contigs of the T. pallidum genome, such as by using well- 
known computer analysis software, and by generating and testing probes or amplification 
primers consisting of the DF sequence in an appropriate diagnostic format which determines 
amplification or hybridization selectivity. 

10 The sequences falling within the scope of the present invention are not hmited to the 

specific sequences herein described, but also include allelic and species variations thereof. Allelic 
and species variations can be routinely determined by comparing the polynucleotide sequences 
provided in SEQ ID NOS: 1-744, ORF IDs and ORFs within, a representative fragment thereof, 
or a nucleotide sequence at least 99% and preferably 99.9% identical to said polynucleotide 

15 sequences, with a sequence fix>m another isolate of the same species. Furthermore, to 

accommodate codon variability, the invention includes nucleic acid molecules coding for the same 
amino acid sequences as do the specific ORFs disclosed herein. In other words, in the coding 
region of an ORF, substitution of one codon for another which encodes the same amino acid is 
expressly contemplated. 

20 Any specific sequence disclosed herein can be readily screened for errors by resequencing 

a particular fiagment, such as an ORF, in both directions sequence both strands). 
Alternatively, error screening can be performed by sequencing corresponding polynucleotides of 
T, pallidum origin isolated by using part or all of the fragments in question as a probe or primer. 
Each of the ORFs of the T. pallidum genome within the ORF IDs of Tables 1, 2 and 3, 

25 and the EMFs found 5' to the ORFs, can be used as polynucleotide reagents in numerous ways. 
For example, the sequences can be used as diagnostic probes or diagnostic amplification primers 
to detect the presence of a specific microbe in a sample, particularly 7. pallidum. Especially 
preferred in this regard are ORFs such as those of Table 3. which do not match previously 
characterized sequences fix)m other organisms and thus are most likely to be highly selective for 

30 T pallidum. Also particularly preferred are ORFs that can be used to distinguish between strains 
of 7. pallidum, particularly those that distinguish medically important strain, such as drug- 
resistant strains. 

In addition, the fragments of the present invention, as broadly described, can be used to 
control gene expression through triple helix formation or antisense DNA or RNA, both of which 
35 methods are based on the binding of a polynucleotide sequence to DNA or RNA. Triple helix- 
formation optimally results in a shut-off of RNA transcription from DNA. while antisense RNA 
hybridization blocks translation of an mRNA molecule into polyp^tide. Information fix)m the 
sequences of the present invention can be used to design antisense and triple helix-forming 
oligonucleotides. Polynucleotides suitable for use in these methods are usually 20 to 40 bases in 
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length and are designed to be complementary to a region of the gene involved in transcription, for 
triple-helix formation, or to the mRNA itself, for antisense inhibition. Both techniques have been 
demonstrated to be effective in model systems, and the requisite techniques are well known and 
involve routine procedures. Triple helix techniques are discussed in, for example, Lee et al, 
5 Nucl Acids Res. 6:3073 (1979); Cooney et al. Science 241:456 (1988); and Dervan et a/.. 

Science 257:1360 (1991). Antisense techniques in general are discussed in, for instance, Okano, 
J. Neurochem, 56:560 (1991) and Oligodeoxynucleotides as Antisense Inhibitors of Gene 
Expression, CRC Press, Boca Raton, FL (1988)). 



10 fragments of the T. pallidum genomic fragments and contigs of the present invention. Certain 
preferred recombinant constructs of the present invention comprise a vector, such as a plasmid or 
viral vector, into which a fragment of the T, pallidum genome has been inserted, in a forward or 
reverse orientation. In the case of a vector comprising one of the ORFs of the present invention, 
the vector may further comprise regulatory sequences, including for example, a promoter, 

15 operably linked to the ORF. For vectors comprising the EMFs of the present invention, the 
vector may further comprise a marker sequence or heterologous ORF operably linked to the 



Large numbers of suitable vectors and promoters are known to those of skill in the art and 
are commercially available for generating the recombinant constructs of the present invention. 

20 The following vectors are provided by way of example. Useful bacterial vectors include 
phagescript, PsiX174, pBluescript SK, pBS KS, pNH8a, pNH16a, pNH18a, pNH46a 
(available from Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (available from 
Pharmacia). Useful eukaryotic vectors include pWLneo, pSV2cat, pOG44, pXTl, pSG 
(available from Stratagene) pSVK3, pBPV, pMSG, pSVL (available from Pharmacia). 

25 Promoter regions can be selected from any desired gene using CAT (chloramphenicol 

transferase) vectors or other vectors with selectable markers. Two appropriate vectors are 
pKK232-8 and pCM7. Particular named bacterial promoters include lad, lacZ, T3, T7, gpt, 
lambda PR, and trc. Eukaryotic promoters include CMV immediate early, HSV thymidine 
kinase, early and late SV40, LTRs from retrovirus, and mouse metallothidnein- L Selection of 

30 the appropriate vector and promoter is well within the level of ordinary skill in the art 

The present invention further provides host cells containing any one of the isolated 
fragments of the T. pallidum genomic fragments and contigs of the present invention, wherein 
the fragment has been introduced into the host cell using known methods. The host cell can be 
a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic host cell, such as a 

35 yeast cell, or a procaiyotic cell, such as a bacterial cell. 

A polynucleotide of the present invention, such as a recombinant constmct comprising an 
ORF of the present invention, may be introduced into the host by a variety of well established 
techniques that are standard in the art, such as calcium phosphate transfection, DEAE, dextran 



The present invention further provides recombinant constructs comprising one or more 



EMF. 
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mediated transfection and electroporation, which are described in, for instance, Davis, L. et al,^ 
BASIC METHODS IN MOLECULAR BIOLOGY (1986). 

A host cell conteiining one of the fragments of the T, pallidum genomic fragments and 
contigs of the present invention, can be used in conventional manners to produce the gene 
5 product encoded by the isolated fragment (in the case of an ORF) or can be used to produce a 
heterologous protein under the control of the EMF. 

The present invention further provides isolated polypeptides encoded by the nucleic acid 
fragments of the present invention or by degenerate variants of the nucleic acid fragments of the 
present invention. By "degenerate variant" is intended nucleotide fragments which differ from a 
10 nucleic acid fragment of the present invention (e.^., an ORF) by nucleotide sequence but, due to 
the degeneracy of the Genetic Code, encode an identical polypeptide sequence. 

Preferred nucleic acid fragments of the present invention are the ORF IDs depicted in 
Tables 2 and 3 and the ORFs within which encode proteins. 



15 isolated polypeptides or proteins of the present invention. At the simplest level, the amino acid 
sequence can be synthesized using commercially available peptide synthesizers. This is 
particularly useful in producing small peptides and fragments of larger polypeptides. Such short 
fragments as may be obtained most readily by synthesis are useful, for example, in generating 
antibodies against the native polypeptide, as discussed further below. 

20 In an alternative method, the polypeptide or protein is purified from bacterial cells which 

naturally produce the polypeptide or protein. One skilled in the art can readily employ well- 
known methods for isolating polypeptides and proteins to isolate and purify polypeptides or 
proteins of the present invention produced naturally by a bacterial strain, or by other methods. 
Methods for isolation and purification that can be employed in this regard include, but are not 

25 limited to, immunochromatography, HPLC, size-exclusion chromatography, ion-exchange 
chromatography, and immuno-affinity chromatography. 

The polypeptides and proteins of the present invention also can be purified from cells 
which have been altered to express the desired polypeptide or protein. As used herein, a cell is 
said to be altered to express a desired polypeptide or protein when the cell, through genetic 

30 manipulation, is made to produce a polypeptide or protein which it normally does not produce or 
which the cell normally produces at a lower level. Those skilled in the art can readily adapt 
procedures for introducing and expressing either recombinant or synthetic sequences into 
eukaryotic or prokaryotic cells in order to generate a cell which produces one of the polypeptides 
or proteins of the present invention. 

35 The polypeptides of the present invention are preferably provided in an isolated form, and 

preferably are substantially purified. A recombinantly produced version of the T. pallidum 
polypeptide can be substantially purified by the one-step method described by Smith et al. (1988) 
C}ene 67:31-40. Polypeptides of the invention also can be purified from natural or recombinant 
sources using antibodies directed against the polypeptides of the invention in methods which are 



A variety of methodologies known in the art can be utilized to obtain any one of the 
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well known in the art of protein purification. 

The invention further provides for isolated T. pallidum polypeptides comprising an amino 
acid sequence selected from the group including: (a) the amino acid sequence of a full-length T 
pallidum polypeptide having the complete amino acid sequence from the first methionine codon to 
5 the termination codon of each sequence listed in SEQ ID NOS: 1-744, wherein said termination 
codon is at the end of each SEQ ID NO: and said first methionine is the first methionine in firame 
with said termination codon; and (b) the amino acid sequence of a full-length T. pallidum 
polypeptide having the complete amino acid sequence in (a) excepting the N-terminal methionine. 

The polypeptides of the present invention also include polypeptides having an amino acid 
10 sequence at least 80% identical, more preferably at least 90% identical, and still more preferably 
95%, 96%, 97%, 98% or 99% identical to those described in (a) and (b) above. 

The present invention is further directed to polynucleotides encoding portions or 
fragments of the amino acid sequences described herein as well as to portions or fragments of the 
isolated amino acid sequences described herein. Fragments include portions of the amino acid 
15 sequences described herein at least 5 contiguous amino acid in length and selected from any two 
integers, one of which representing an N-terminal position and another representing a C-terminal 
position. The initiation codon of the ORFs of the present invention is position 1. The initiation 
codon (positon 1) for puiposes of the present invention is the first methionine codon of each 
ORF ID which is in frame with the termination codon at the end of each said sequence. Every 
20 combination of a N-terminal and C-terminal position that a fragment at least 5 contiguous amino 
acid residues in length could occupy, on any given ORF is included in the invention, i.e., fix)m 
initiation codon up to the termination codon. "At least" means a fragment may be 5 contiguous 
amino acid residues in length or any integer between 5 and the number of residues in an ORF, 
minus 1. Therefore, included in the invention are contiguous fragments specified by any N- 
25 terminal and C-terminal positions of amino acid sequence set forth in SEQ ID NOS: 1-744 or 

Tables 1-3 wherein the contiguous fragment is any integer between 5 and the number of residues 
in an ORF minus 1 . 

Further, the invention includes polypeptides comprising fragments specified by size, in 
amino acid residues, rather than by N-terminal arid C-terminal positions. The invention includes 

30 any fragment size, in contiguous amino acid residues, selected from integers between 5 and the 
number of residues in an ORF. minus 1. Preferred sizes of contiguous polypeptide fragments 
include about 5 amino acid residues, about 10 amino acid residues, about 20 amino acid residues, 
about 30 amino acid residues, about 40 amino acid residues, about 50 amino acid residues, about 
100 amino acid residues, about 200 amino acid residues, about 300 amino acid residues, and 

35 about 400 amino acid residues. The preferred sizes are, of course, meant to exemplify, not limit, 
the present invention as all size fragments representing any integer between 5 and the number of 
residues in a fuD length sequence minus 1 are included in the invention. The present invention 
also provides for the exclusion of any fragments specified by N-terminal and C-terminal 
positions or by size in amino acid residues as described above. Any number of fragments 
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specified by N-tenninal and C-terminal positions or by size in amino acid residues as described 
above may be excluded. 

The above fragments need not be active since they would be useful, for example, in 
immunoassays, in epitope mapping, epitope tagging, to generate antibodies to a particular portion 
5 of the protein, as vaccines, and as molecular weight markers. 

Further polypeptides of the present invention include polypeptides which have at least 
90% similarity, more preferably at least 95% similarity, and still more preferably at least 96%, 
97%, 98% or 99% similarity to those described above. 

A further embodiment of the invention relates to a polypeptide which comprises the amino 
10 acid sequence of a T, pallidum polypeptide having an amino acid sequence which contains at least 
one conservative amino acid substitution, but not more than 50 conservative amino acid 
substitutions, not more than 40 conservative amino acid substitutions, not more than 30 
conservative amino acid substitutions, and not more than 20 conservative amino acid 
substitutions. Also provided are polypeptides which comprise the amino acid sequence of a J. 
15 pallidum polypeptide, having at least one, but not more than 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 
conservative amino acid substitutions. 

By a polypeptide having an amino acid sequence at least, for example, 95% "identical" to 
a query amino acid sequence of the present invention, it is intended that the amino acid sequence 
of the subject polypeptide is identical to the query sequence except that the subject polypeptide 
20 sequence may include up to five amino acid alterations per each 100 amino acids of the query 
amino acid sequence. In other words, to obtain a polypeptide having an amino acid sequence at 
least 95% identical to a query amino acid sequence, up to 5% of the amino acid residues in the 
subject sequence may be inserted, deleted, (indels) or substituted with another amino acid. These 
alterations of the reference sequence may occur at the amino or carboxy terminal positions of the 
25 reference amino acid sequence or anywhere between those terminal positions, interspersed either 
individually among residues in the reference sequence. 

As a practical matter, whether any particular polypeptide is at least 90%, 95%, 96%, 
97%, 98% or 99% identical to the ORF amino acid sequences encoded by the sequences of SEQ 
ID NOS: 1-744, as described hererin, can be determined conventionally using known computer 
30 programs. A preferred metfiod for determining the best overall match between a query sequence 
(a sequence of the present invention) and a subject sequence, also referred to as a global sequence 
aligimient, can be determined using the FASTDB computer program based on the algorithm of 
BruUag et al., (1990) Comp. App. Biosci. 6:237-245. In a sequence alignment the query and 
subject sequences are both amino acid sequences. The result of said global sequence aligimient is 
35 in percent identity. Preferred parameters used in a FASTDB amino acid aligruxient are: 

Matnx=PAM 0, k-tup]e=2. Mismatch Penalty=l, Joining Penalty=20, Randomization Group 
Length=0, Cutoff Score=l, Window Size=sequence length. Gap Penalty=5, Gap Size 
Penalty=0.05, Window Size=500 or the length of the subject amino acid sequence, whichever is 
shorter. 
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If the subject sequence is shorter than the query sequence due to N- or C-terminal 
deletions, not because of internal deletions, the results, in percent identity, must be manually 
corrected. This is because the FASTDB program does not account for N- and C-terminal 
truncations of the subject sequence when calculating global percent identity. For subject 
5 sequences truncated at the N- and C-termini, relative to the query sequence, the percent identity is 
corrected by calculating the number of residues of the query sequence that are N- and C-terminal 
of the subject sequence, which are not matched/aligned with a corresponding subject residue, as a 
percent of the total bases of the query sequence. Whether a residue is matched/aligned is 
determined by results of the FASTDB sequence alignment This percentage is then subtracted 

10 from the percent identity, calculated by the above FASTDB program using the specified 

parameters, to arrive at a final percent identity score. This final percent identity score is what is 
used for the purposes of the present invention. Only residues to the N- and C-termini of the 
subject sequence, which are not matched/aligned with the query sequence, are considered for the 
purposes of manually adjusting the percent identity score. That is, only query amino acid 

15 residues outside the farthest N- and C-terminal residues of the subject sequence. 

For example, a 90 amino acid residue subject sequence is aligned with a 100 residue 
query sequence to determine percent identity. The deletion occurs at the N-terminus of the 
subject sequence and therefore, the FASTDB alignment does not match/align with the first 10 
residues at the N-terminus. The 10 unpaired residues represent 10% of the sequence (number of 

20 residues at the N- and C- termini not matched/total number of residues in the query sequence) so 
10% is subtracted from the percent identity score calculated by the FASTDB program. If the 
remaining 90 residues were perfectly matched the final percent identity would be 90%. In 
another example, a 90 residue subject sequence is compared with a 100 residue query sequence. 
This time the deletions are internal so there are no residues at the N- or C-termini of the subject 

25 sequence which are not matched/aligned with the query. In this case the percent identity 

calculated by FASTDB is not manually corrected. Once again, only residue positions outside the 
N- and C-terminal ends of the subject sequence, as displayed in the FASTDB alignment, which 
are not matched/aligned with the query sequence are manuaUy corrected. No other manual 
corrections are to made for the purposes of the present invention. 

30 The above polypeptide sequences are included irrespective of whether they have their 

normal biological activity. This is because even where a particular polypeptide molecule does not 
have biological activity, one of skill in the art would still know how to use the polypeptide, for 
instance, as a vaccine or to generate antibodies. Other uses of the polypeptides of the present 
invention that do not have T. pallidum activity include, inter alia^ as epitope tags, in epitope 

35 mapping, and as molecular weight markers on SDS-PAGE gels or on molecular sieve gel 
filtration coluimis using methods known to those of skill in the ait. 

As described below, the polypq)tides of the present invention can also be used to raise polyclonal 
and monoclonal antibodies, which are useful in assays for detecting T, pallidum protein 
expression or as agonists and antagonists capable of enhancing or inhibiting J. pallidum protein 
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function. Further, such polypeptides can be used in the yeast two-hybrid system to "capture"' 7. 
pallidum protein binding proteins which are also candidate agonists and antagonists according to 
the present invention. See, e.g.. Fields et al. (1989) Nature 340:245-246. 

Any host/vector system can be used to express one or more of the ORFs of the present 
5 invention. These include, but are not limited to, eukaryotic hosts such as HeLa cells, CV-1 cell, 
COS cells, and Sf9 cells, as well as prokaryotic host such as E. coli and B. subtilis. The most 
preferred cells are those which do not normally express the particular polypeptide or protein or 
which expresses the polypeptide or protein at low natural level. 



10 recombinant {e.g,^ microbial or mammalian) expression systems. "Microbial" refers to 
recombinant polypeptides or proteins made in bacterial or fungal (e.g., yeast) expression 
systems. As a product, "recombinant microbial "defines a polypeptide or protein essentially finee 
of native endogenous substances and unaccompanied by associated native glycosylation. 
Polypeptides or proteins expressed in most bacterial cultures, e.g,, E, coli, will be free of 

15 glycosylation modifications; polypeptides or proteins expressed in yeast will have a glycosylation 
pattern different from that expressed in mammalian cells. 

"Nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides. Generally, 
DNA segments encoding the polyi)eptides and proteins provided by this invention are assembled 
from fragments of the T. pallidum genome and short oligonucleotide linkers, or from a series of 

20 oligonucleotides, to provide a synthetic gene which is capable of being expressed in a 

recombinant transcriptional unit comprising regulatory elements derived from a microbial or viral 
operon. 

Recombinant expression vehicle or vector" refers to a plasmid or phage or virus or 
vector, for expressing a polypeptide from a DNA (RNA) sequence. The expression vehicle can 

25 comprise a transcriptional unit comprising an assembly of (1) a genetic regulatory elements 
necessary for gene expression in the host, including elements required to initiate and maintain 
transcription at a level sufficient for suitable expression of the desired polypeptide, including, for 
example, promoters and, where necessary, an enhancer and a polyadenylation signal; (2) a 
structural or coding sequence which is transcribed into mRNA and translated into protein, and (3) 

30 appropriate signals to initiate translation at the beginning of the desired coding region and 

terminate translation at its end. Structural units intended for use in yeast or eukaryotic expression 
systems preferably include a leader sequence enabling extraceUular secretion of translated protein 
by a host cell. Alternatively, where recombinant protein is expressed without a leader or 
transport sequence, it may include an N-terminal methionine residue. This residue may or may 

35 not be subsequently cleaved fix>m the expressed recombinant protein to provide a finai product 
"Recombinant expression system" means host cells which have stably integrated a 
recombinant transcriptional unit into chromosomal DNA or carry the recombinant transcriptional 
unit extra chromosomally. The cells can be prokaryotic or eukaryotic. Recombinant expression 



111 



'Recombinant," as used herein, means that a polypeptide or protein is derived from 
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systems as defined herein will express heterologous polypeptides or proteins upon induction of 
the regulatory elements linked to the DNA segment or synthetic gene to be expressed. 

Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under 
the control of appropriate promoters. Cell-free translation systems can also be employed to 
5 produce such proteins using RN As derived from the DNA constructs of the present invention. 
Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts are 
described in Sambrook et al. Molecular Cloning: A Laboratory Manual, 2nd Edition, Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, New York (1989), the disclosure of 
which is hereby incoiporated by reference in its entirety. 

10 Generally, recombinant expression vectors will include origins of replication and 

selectable markers permitting transformation of the host cell, e.g., the ampicillin resistance gene 
of E. coli and S. cerevisiae TRPl gene, and a promoter derived from a highly expressed gene to 
direct transcription of a downstream structural sequence. Such promoters can be derived from 
operons encoding glycolytic enzymes such as 3- phosphoglycerate kinase (PGK), alpha-factor, 

15 acid phosphatase, or heat shock proteins, among others. The heterologous structural sequence is 
assembled in appropriate phase with translation initiation and termination sequences, and 
preferably, a leader sequence capable of directing secretion of translated protein into the 
periplasmic space or extracellular medium. Optionally, the heterologous sequence can encode a 
fusion protein including an N-terminal identification peptide imparting desired characteristics, 

20 e.g., stabilization or simplified purification of expressed recombinant product 

Useful expression vectors for bacterial use are constructed by inserting a structural DNA 
sequence encoding a desired protein together with suitable translation initiation and termination 
signals in operable reading phase with a functional promoter. The vector will comprise one or 
more phenotypic selectable markers and an origin of replication to ensure maintenance of the 

25 vector and, when desirable, provide amplification within the host. 

Suitable prokaryotic hosts for transformation include strains of E, coli, B. subtilis. 
Salmonella typhimurium and various species within the genera Pseudomonas and Streptomyces, 
Others may, also be employed as a matter of choice. 

As a representative but non-limiting example, useful expression vectors for bacterial use 

30 can comprise a selectable marker and bacterial origin of replication derived from commercially 
available plasmids comprising genetic elements of the well known cloning vector pBR322 
(ATCC 37017). Such commercial vectors include, for example, pKK223-3 (available form 
Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 (available ftom Promega Biotec, 
Madison, WI, USA). These pBR322 "backbone" sections are combined with an appropriate 

35 promoter and the structural sequence to be expressed. 

Following transformation of a suitable host strain and growth of the host strain to an 
appropriate cell density, the selected promoter, where it is inducible, is derepressed or induced by 
appropriate means (e.g., temperature shift or chemical induction) and cells are cultured for an 
additional period to provide for expression of the induced gene product Thereafter cells are 
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typically harvested, generally by centrifugation, disrupted to release expressed protein, generally 
by physical or chemical means, and the resulting crude exuact is retained for further purification. 

Various mammalian cell culture systems can also be employed to express recombinant 
protein. Examples of mammalian expression systems include the COS-7 lines of monkey kidney 
S fibroblasts, described in Gluzman, Cell 23:M5 (198 1 ). and other cell lines capable of expressing 
a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell lines. 

Mammalian expression vectors will comprise an origin of replication, a suitable promoter 
and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor 
and acceptor sites, transcriptional termination sequences, and 5 flanking nontranscribed 

10 sequences. DNA sequences derived from the SV40 viral genome, for example, S V40 origin, 
early promoter, enhancer, splice, and polyadenylation sites may be used to provide the required 
nontranscribed genetic elements. 

Recombinant polypeptides and proteins produced in bacterial culture is usually isolated by 
initial extraction from cell pellets, followed by one or more salting-out, aqueous ion exchange or 

15 size exclusion chromatography steps. Microbial cells employed in expression of proteins can be 
disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical 
disruption, or use of ceU lysing agents. Protein refolding steps can be used, as necessary, in 
completing configuration of the mature protein. Rnally, high performance liquid 
chromatography (HPIX!) can be employed for final purification steps. 

20 The present invention further includes isolated polypeptides, proteins and nucleic acid 

molecules which are substantially equivalent to those herein described. As used herein, 
substantially equivalent can refer both to nucleic acid and amino acid sequences, for example a 
mutant sequence, that varies from a reference sequence by one or more substitutions, deletions, 
or additions, the net effect of which does not result in an adverse functional dissimilarity between 

25 reference and subject sequences. For purposes of the present invention, sequences having 
equivalent biological activity, and equivalent expression characteristics are considered 
substantially equivalent. For purposes of determining equivalence, truncation of the mature 
sequence should be disregarded. 

The invention further provides methods of obtaining homologs from other strains of T. 

30 pallidum, of the fragments of the J. pallidum genome of the present invention and homologs of 
the proteins encoded by the ORFs of the present invention. As used herein, a sequence or 
protein of T, pallidum is defined as a homolog of a fragment of the T, pallidum fragments or 
contigs or a protein encoded by one of the ORFs of the present invention, if it shares significant 
homology to one of the fragments of the r. pallidum genome of the present invention or a protein 

35 encoded by one of the ORFs of the present invention. Specifically, by using the sequence 

disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony/plaque 
hybridization, one skilled in the art can obtain homologs. 

As used herein, two nucleic acid molecules or proteins are said to "share significant 
homology" if the two contain regions which possess greater than 85% sequence (amino acid or 
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nucleic acid) homology. Preferred homologs in this regard are those with more than 90% 
homology. Especially preferred are those with 93% or more homology. Among especially 
preferred homologs those with 95% or more homology are particularly preferred. Very 
particularly preferred among these are those with 97% and even more particularly preferred 
5 among those arc homologs with 99% or more homology. The most preferred homologs among 
these are those with 99.9% homology or more. It will be understood that, among measures of 
homology, identity is particularly preferred in this regard. 

Region specific primers or probes derived from the nucleotide sequence provided in SEQ 
ID NOS: 1-744 or from a nucleotide sequence at least 95%, particularly at least 99%. especially 

10 at least 99.5% identical to a sequence of SEQ ID NOS: 1-744 can be used to prime DNA 

synthesis and PGR amplification, as well as to identify colonies containing cloned DNA encoding 
a homolog. Methods suitable to this aspect of the present invention arc well known and have 
been described in great detail in many publications such as, for example, Innis et aL, PCR 
Protocols, Academic Press. San Diego, CA (1990)). 

15 When using primers derived from SEQ ID NOS: 1-744 or from a nucleotide sequence 

having an aforementioned identity to a sequence of SEQ ID NOS: 1-744, one skilled in the art will 
recognize that by employing high stringency conditions (e,g., annealing at 50-60®C in 6X SSPC 
and 50% formamide. and washing at 50- 65''C in 0.5X SSPC) only sequences which are greater 
than 75% homologous to the primer will be amplified. By employing lower stringency 

20 conditions (e.g., hybridizing at 35-37''C in 5X SSPC and 40-45% formamide, and washing at 
42''C in 0.5X SSPC), sequences which are greater than 40-50% homologous to the primer will 
also be amplified. 

When using DNA probes derived from SEQ ID NOS: 1-744, or from a nucleotide 
sequence having an aforementioned identity to a sequence of SEQ ID NOS: 1-744 , for 

25 colony/plaque hybridization, one skilled in the art will recognize that by employing high 
stringency conditions (e.g., hybridizing at 50- 65**C in 5X SSPC and 50% formamide, and 
washing at 50- 65°C in 0.5X SSPC), sequences having regions which are greater than 90% 
homologous to the probe can be obtained, and that by employing lower stringency conditions 
(eg., hybridizing at 35-37**C in 5X SSPC and 40-45% formamide, and washing at 42**C in 0.5X 

30 SSPC), sequences having regions which are greater than 35-45% homologous to the probe will 
be obtained. 

Any organism can be used as the source for homologs of the present invention so long as 
the organism naturally expresses such a protein or contains genes encoding the same. The most 
preferred organism for Isolating homologs are bacteria which are closely related to T, pallidum. 



ILLUSTRATIVE USES OF COMPOSITIONS 
OF THE INVENTION 

Each ORF conesponding to the ORF IDs provided in Tables 1 and 2 is identified with a 
function by homology to a known gene or polypeptide. As a result, one skilled in the art can use 



35 
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the polypeptides of the present invention for commercial, therapeutic and industrial purposes 
consistent with the type of putative identification of the polypeptide. Such identifications permit 
one skilled in the art to use the 7. pallidum ORFs in a manner similar to the known type of 
sequences for which the identification is made; for example, to ferment a particular sugar source 
5 or to produce a particular metabolite. A variety of reviews illustrative of this aspect of the 
invention are available, including the following reviews on the industrial use of enzymes, for 
example, BIOCHEMICAL ENGINEERING AND BIOTECHNOLOGY HANDBOOK, 2nd 
Ed., MacMillan Publications, Ltd. NY (1991) and BIOCATALYSTS IN ORGANIC 
SYNTHESES, Tramper et aL, Eds., Elsevier Science Publishers, Amsterdam, The Netherlands 
10 (1985). A variety of exemplary uses that illustrate this and similar aspects of the present 
invention are discussed below. 

L Biosynthetic Enzymes 

Open reading frames encoding proteins involved in mediating the catalytic reactions 

1 5 involved in intermediary and macromolecular metabolism, the biosynthesis of small molecules, 
cellular processes and other functions includes enzymes involved in the degradation of the 
intermediary products of metabolism, enzymes involved in central intermediary metabolism, 
enzymes involved in respiration, both aerobic and anaerobic, enzymes involved in fermentation, 
enzymes involved in ATP proton motor force conversion, enzymes involved in broad regulatory 

20 function, enzymes involved in amino acid synthesis, enzymes involved in nucleotide synthesis, 
enzymes involved in cofactor and vitamin synthesis, can be used for industrial biosynthesis. 

The various metabolic pathways present in T pallidum can be identified based on 
absolute nutritional requirements as well as by examining the various enzymes identified in Table 
1-3 and SEQ ID NOS: 1-744. 

25 Of particular interest are polypeptides involved in the degradation of intermediary 

metabolites as well as non-macromolecular metabolism. Such enzymes include amylases, 
glucose oxidases, and catalase. 

Proteolytic enzymes are another class of commercially important enzymes. Proteolytic 
enzymes find use in a number of industrial processes including the processing of flax and other 

30 vegetable fibers, in the extraction, clarification and depectinization of fruit juices, in the extraction 
of vegetables' oil and in the maceration of fruits and vegetables to give unicellular firuits. A 
detailed review of the proteolytic enzymes used in the food industry is provided in Rombouts et 
aL, Symbiosis 21:19 (1986) and Voragen et al in Biocatalysts In Agricultural Biotechnology^ 
Whitaker et al^ Eds., American Chemical Society Symposium Series 389:93 (1989) . 

35 The metabolism of sugars is an important aspect of the primary metabolism of T. 

pallidum. Enzymes involved in the degradation of sugars, such as, particularly, glucose, 
galactose, fiuctose and xylose, can be used in industrial fermentation. Some of the important 
sugar transforming enz3anes, from a commercial viewpoint, include sugar isomerases such as 
glucose isomerase. Other metabolic enzymes have found conmiercial use such as glucose 
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oxidases which produces ketogulonic acid (KGA). KGA is an intermediate in the commercial 
production of ascorbic acid using the Reichstein's procedure, as described in Krueger et al.^ 
Biotechnology 6(A), Rhine et ai, Eds,, Verlag Press. Weinheim, Germany (1984). 

Glucose oxidase (GOD) is conunercially available and has been used in purified form as 
5 well as in an immobilized form for the deoxygenation of beer. See, for instance, Hartmeir et aL, 
Biotechnology Letters 7:21 (1979). The most important application of GOD is the industrial 
scale fermentation of gluconic acid. Maiicet for gluconic acids which are used in the detergent, 
textile. leather, photographic, pharmaceutical, food, feed and concrete industry, as described, for 
example, in Bigelis et al, beginning on page 357 in GENE MANIPULATIONS AND FUNGI; 
10 Benett et ai, Eds., Academic Press, New York (1985). In addition to industrial applications. 
GOD has found applications in medicine for quantitative determination of glucose in body fluids 
recently in biotechnology for analyzing syrups from starch and cellulose hydrosylates. This 
application is described in Owusu et aL^ Biochem. et Biophysica. Acta. 872.-83 (1986), for 
instance. 

1 5 The main sweetener used in the world today is sugar which comes from sugar beets and 

sugar cane. In the field of industrial enzymes, the glucose isomerase process shows the largest 
expansion in the market today. Initially, soluble enzymes were used and later immobilized 
enzymes were developed (Krueger et al. Biotechnology, The Textbook of Industrial 
Microbiology^ Sinauer Associated Incorporated, Sunderland, Massachusetts (1990)). Today, the 

20 use of glucose- produced high fructose symps is by far the largest industrial business using 
inmiobilized enzymes. A review of the industrial use of these enzymes is provided by 
Jorgensen, Starch 40'301 (1988). 

Proteinases, such as alkaline serine proteinases, are used as detergent additives and thus 
represent one of the largest volumes of microbial enzymes used in the industrial sector. Because 

25 of their industrial importance, there is a large body of published and unpublished inforiaiation 
regarding the use of these enzymes in industrial processes. (See Faultman et al.^ Acid Proteases 
Structure Function and Biology, Tang, J., ed.. Plenum Press, New York (1977) and Godfrey et 
al. Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner et a/.. Report 
Industrial Enzymes by 1990, Hel Hepner & Associates, London (1986)). 

30 Another class of commercially usable proteins of the present invention are the microbial 

lipases, described by, for instance, Macrae et al^ Philosophical Transactions of the Chiral 
Society of London 310:221 (1985) and Poserke, Journal of the American Oil Chemist Society 
61: 1758 (1984). A major use of lipases is in the fat and oil industry for the production of neutral 
glycerides using lipase catalyzed inter-esterifrcation of readily available triglycerides. Application 

35 of lipases include the use as a detergent additive to facilitate the removal of fats from fabrics in the 
course of the washing procedures. 



synthesis of complex organic molecules is gaining popularity at a great rate. One area of great 
interest is the preparation of chiral intermediates. Preparation of chiral intermediates is of inteiest 



The use of enzymes, and in particular microbial enzymes, as catalyst for key steps in the 
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to a wide range of synthetic chemists particularly those scientists involved with the preparation of 
new pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et aL, Recent 
Advances in the Generation of Chiral Intermediates Using Enzymes ^ CRC Press, Boca Raton, 
Florida (1990)). The following reactions catalyzed by enzymes are of interest to organic 
5 chemists: hydrolysis of caxboxylic acid esters, phosphate esters, amides and nitriles, 

esterification reactions, trans-esteriflcation reactions, synthesis of amides, reduction of alkanones 
and oxoalkanates, oxidation of alcohols to carbonyl compounds, oxidation of sulfides to 
sulfoxides, and carbon bond forming reactions such as the aldol reaction. 

When considering the use of an enzyme encoded by one of the ORFs of the present 
10 invention for biotransformation and organic synthesis it is sometimes necessary to consider the 
respective advantages and disadvantages of using a microorganism as opposed to an isolated 
enzyme. Pros and cons of using a whole cell system on the one hand or an isolated partially 
purified enzyme on the other hand, has been described in detail by Bud et aL^ Chemistry in 
Britain (1987), p. 127. 

15 Amino transferases, enzymes involved in the biosynthesis and metabolism of amino 

acids, are useful in the catal3mc production of amino acids. The advantages of using microbial 
based enzyme systems is that the amino transferase enzymes catalyze the stereo- selective 
synthesis of only L-amino acids and generally possess uniformly high catalytic rates. A 
description of the use of amino transferases for amino acid production is provided by Roselle- 

20 David, Methods of Enzymology 136:479 (1987). 

Another category of useful proteins encoded by the ORFs of the^present invention include 
enzymes involved in nucleic acid synthesis, repair, and recombination. 

2. Generation of Antibodies 
25 As described here, the proteins of the present invention, as well as homologs thereof, can 

be used in a variety of procedures and methods known in the art which are currently applied to 
other proteins. The proteins of the present invention can further be used to generate an antibody 
which selectively binds the protein. 

r. pallidum protein-specific antibodies for use in the present invention can be raised 
30 against the intact 7. pallidum protein or an antigenic polypeptide fragment thereof, which may be 
presented together with a carrier protein, such as an albunun, to an animal system (such as rabbit 
or mouse) or, if it is long enough (at least about 25 amino acids), without a carrier. 

As used herein, the term "antibody" (Ab) or "monoclonal antibody" (Mab) is meant to 
. include intact molecules, single chain whole antibodies, and antibody fragments. Antibody 
35 fragments of the present invention include Fab and F(ab')2 and other fragments including single- 
chain Fvs (scFv) and disulfide-linked Fvs (sdFv). Also included in the present invention are 
chimeric and humanized monoclonal antibodies and polyclonal antibodies specific for the 
polypeptides of the present invention. The antibodies of the present invention may be prepared 
by any of a variety of methods. For example, cells expressing a polypeptide of the present 
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invention or an antigenic fragment thereof can be adnunistered to an animal in order to induce the 
production of sera containing polyclonal antibodies. For example, a preparation of T, pallidum 
polypeptide or fragment thereof is prepared and purified to render it substantially free of natural 
contaminants. Such a preparation is then introduced into an animal in order to produce 
S polyclonal antisera of greater specific activity. 

In a preferred method, the antibodies of the present invention are monoclonal antibodies 
or binding fragments thereof. Such monoclonal antibodies can be prepared using hybridoma 
technology. See, e.g., Harlow et al., ANTIBODIES: A LABORATORY MANUAL, (Cold 
Spring Harbor Laboratory Press. 2nd ed. 1988); Hammerling, et al., in: MONOCLONAL 

10 ANTIBODIES AND T-CELL HYBRIDOMAS 563-681 (Hsevier, N.Y.. 1981). Fab and 

F(ab')2 fragments may be produced by proteolytic cleavage, using enzymes such as papain (to 
produce Fab fragments) or pepsin (to produce F(ab')2 fragments). Alternatively, T, pallidum 
polypeptide-binding fragments, chimeric, and humanized antibodies can be produced through the 
application of recombinant DNA technology or through synthetic chemistry using methods 

15 known in the art 

Alternatively, additional antibodies capable of binding to the polypeptide antigen of the 
present invention may be produced in a two-step procedure through the use of anti-idiotypic 
antibodies. Such a method makes use of the fact that antibodies are themselves antigens, and 
that, therefore, it is possible to obtain an antibody which binds to a second antibody. In 

20 accordance with this method, T. pallidum polypeptide-specific antibodies are used to inmiunize 
an animal, preferably a mouse. The splenocytes of such an animal are then used to produce 
hybridoma cells, and the hybridoma cells are screened to identify clones which produce an 
antibody whose ability to bind to the T. pallidum polypeptide-specific antibody can be blocked 
by the 7. pallidum polypeptide antigen. Such antibodies comprise anti-idiotypic antibodies to 

25 the 7. pallidum polypeptide-specific antibody and can be usfed to immunize an animal to induce 
formation of further 7. pallidum polypeptide-specific antibodies. 

Antibodies and fragements thereof of the present invention may be described by the 
portion of a polypeptide of the present invention recognized or specifically bound by the 
antibody. Antibody binding fragements of a polypeptide of the present invention may be 

30 described or specified in the same manner as for polypeptide fragements discussed above., i.e, 
by N-temiinal and C-terminal positions or by size in contiguous amino acid residues. Any 
number of antibody binding fragments, of a polyp}eptide of the present invention, specified by N- 
terminal and C-terminal positions or by size in amino acid residues, as described above, may also 
be excluded from the present invention. Therefore, the present invention includes antibodies the 

35 specifically bind a panicuarlly discribed fragement of a polypeptide of the present invention and 
allows for the exclusion of the same. 

Antibodies and fragements thereof of the present invention may also be described or specified in 
terms of their cross-reactivity. Antibodies and fragements that do not bind polypeptides of any 
other species of Borrelia other than 7. pallidum are included in the present invention. Likewise, 
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antibodies and fragements that bind only species of Borrelia^ i.e. antibodies and fragements that 
do not bind bacteria from any genus other than Borrelia, are included in the present invention. 

The present invention further provides the above- described antibodies in detectably 
labelled fonn. Antibodies can be detectably labelled through the use of radioisotopes, affinity 
5 labels (such as biotin, avidin, e/c). enzymatic labels (such as horseradish peroxidase, alkaline 
phosphatase, etc.) fluorescent labels (such as FTTC or rhodamine, etc.\ paramagnetic atoms, etc. 
Procedures for accomplishing such labeling are well-known in the art, for example see 
Sternberger et al, 7. Histochem. Cytochem. 78:315 (1970); Bayer, E. A. et a/., Meth, Enzynv 
62:308 (1979); Engval. E. et al, Immunol 109:129 (1972); Coding, J. W., J, Immunol 
10 Meth. 73:215 (1976)). 

The labeled antibodies of the present invention can be used for in vitro, in vivo, and in 
situ assays to identify cells or tissues in which a fragment of the T. pallidum genome is 
expressed. 

The present invention further provides the above-described antibodies immobilized on a 
15 solid support. Examples of such solid supports include plastics such as polycarbonate, complex 
carbohydrates such as agarose and sepharose, acrylic resins and such as polyacrylamide and latex 
beads. Techniques for coupling antibodies to such solid supports are well known in the art 
(Weir. D. M. et al, "Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific 
Publications. Oxford, England, Chapter 10 (1986); Jacoby, W. D. et a/., Meth. Enzym, 34 
20 Academic Press, N. Y. ( 1974)). The irrunobilized antibodies of the present invention can be 
used for in vitro, in vivo, and in situ assays as well as for immunoafflnity purification of the 
proteins of the present invention. 

3. Epitope-Bearing Portions 

25 In another aspect, the invention provides peptides and polypeptides comprising 

epitope-bearing portions of the T. pallidum polypeptides of the present invention. These epitopes 
are immunogenic or antigenic epitopes of the polypeptides of the present invention. An 
"immunogenic epitope" is defined as a part of a protein that elicits an antibody response when the 
whole protein or polypeptide is the immunogen. These immunogenic epitopes are believed to be 

30 confined to a few loci on the molecule. On the other hand, a region of a protein molecule to 

which an antibody can bind is defined as an "antigenic determinant" or "antigenic epitope." The 
number of immunogenic epitopes of a protein generally is less than the number of antigenic 
epitopes. See, e.g.. Gey sen, et al. (1983) Proc. Nati. Acad. Sci. USA 81:3998- 4002. Amino 
acid residues comprising anigenic epitopes may be determined by algorithms such as the the 

35 Jameson*Wolf analysis or similar algorithms or by in vivo testing for an antigenic response using 
the methods described herein or those known in the art. 

As to the selection of peptides or polypeptides bearing an antigenic epitope (i.e., that 
contain a region of a protein molecule to which an antibody can bind), it is well known in that art 
thai relatively short synthetic peptides that mimic part of a protein sequence are routinely capable 
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of eliciting an antiserum that reacts with the partially niimicked protein. See, e.g., Sutcliffe, et 
al.. (1983) Science 219:660-666. Peptides capable of eliciting protein-reactive sera are 
frequently represented in the primary sequence of a protein, can be characterized by a set of 
simple chemical rules, and are confmed neither to immunodominant regions of intact proteins 

5 (i.e. , immunogenic epitopes) nor to the amino or carbqxyi terminals. Peptides that are extremely 
hydrophobic and those of six or fewer residues generally are ineffective at inducing antibodies 
that bind to the mimicked protein; longer, peptides, especially those containing proline residues, 
usually are effective. See, Sutcliffe, et al., suprCy p. 661. For instance, 18 of 20 peptides 
designed according to these guidelines, containing 8-39 residues covering 75% of the sequence 

10 of the influenza virus hemagglutinin HAl polypeptide chain, induced antibodies that reacted with 
the HAl protein or intact virus; and 12/12 peptides from the MuLV polymerase and 18/18 from 
the rabies glycoprotein induced antibodies that precipitated the respective proteins. 

Antigenic epitope-bearing peptides and polypeptides of the invention are therefore useful 
to raise antibodies, including monoclonal antibodies, that bind specifically to a polypeptide of the 

15 invention. Thus, a high proportion of hybridomas obtained by fusion of spleen cells from 
donors immunized with an antigen epitope-bearing peptide generally secrete antibody reactive 
with the native protein. See Sutcliffe, et aL, supra, p. 663. The antibodies raised by antigenic 
epitope-bearing peptides or polypeptides are useful to detect the mimicked protein, and antibodies 
to different peptides may be used for tracking the fate of various regions of a protein precursor 

20 which undergoes post-translational processing. The peptides and anti-peptide antibodies may be 
used in a variety of qualitative or quantitative assays for the mimicked protein, for instance in 
competition assays since it has been shown that even short peptides {e.g., about 9 amino acids) 
can bind and displace the larger peptides in inmiunoprecipitation assays. See, e.g., Wilson, et 
al., (1984) Cell 31:161 -US, The anti-peptide antibodies of the invention also are useful for 

25 purification of the mimicked protein, for instance, by adsorption chromatography using methods 
known in the art. 

Antigenic epitope-bearing peptides and polypeptides of the invention designed according 
to the above guidelines preferably contain a sequence of at least seven, more preferably at least 
nine and most preferably between about 10 to about 50 amino acids (i.e. any integer between 7 

30 and 50) contained within the amino acid sequence of a polypeptide of the invention. However, 
peptides or polypeptides comprising a larger portion of an amino acid sequence of a polypeptide 
of the invention, containing about 50 to about 100 amino acids, or any length up to and including 
the entire amino acid sequence of a polypeptide of the invention, also are considered 
epitope-bearing peptides or polypeptides of the invention and also are useful for inducing 

35 antibodies that react with the mimicked protein. Preferably, the amino acid sequence of the 

epitope-bearing peptide is selected to provide substantial solubility in aqueous solvents the 
sequence includes relatively hydrophilic residues and highly hydrophobic sequences are 
preferably avoided); and sequences containing proline residues are particularly preferred. 
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The epitope-bearing peptides and polypeptides of the present invention may be produced 
by any conventional means for making peptides or polypeptides including recombinant means 
using nucleic acid molecules of the invention. For instance, an epitope-bearing amino acid 
sequence of the present invention may be fiised to a larger polypeptide which acts as a carrier 
5 during recombinant production and purification, as well as during inmiunization to produce 

anti-peptide antibodies. Epitope-bearing peptides also may be synthesized using known methods 
of chemical synthesis. For instance, Houghten has described a simple method for synthesis of 
large numbers of peptides, such as 10-20 mg of 248 different 13 residue peptides representing 
single amino acid variants of a segment of the HAl polypeptide which were prepared and 

10 characterized (by ELISA-type binding studies) in less than four weeks (Houghten, R. A. Proc. 
Nad. Acad, Sci. USA 82:5131-5135 (1985)). This "Simultaneous Multiple Peptide Synthesis 
(SMPS)" process is further described in U.S. Patent No. 4,631^1 1 to Houghten and coworkers 
(1986). In this procedure the individual resins for the solid-phase synthesis of various peptides 
are contained in separate solvent-permeable packets, enabling the optimal use of the many 

15 identical repetitive steps involved in solid-phase methods. A completely manual procedure 
allows 5(X)-1(XX) or more syntheses to be conducted simultaneously (Houghten et al. (1985) 
Proc. Nati. Acad, Sci. 82:5131-5135 at 5134. 

Epitope-bearing peptides and polypeptides of the invention are used to induce antibodies 
according to methods well known in the art. See, e.g., Sutcliffe, et al., supra\\ Wilson, et al., 

20 supraw and BitUe, et al. (1985) J. Gen. Virol. 66:2347-2354. Generally, animals may be 

inmiunized with free peptide; however^ anti-peptide antibody titer may be boosted by coupling of 
the peptide to a macromolecular carrier, such as keyhole limpet hemacyanin (KLH) or tetanus 
toxoid. For instance, peptides containing cysteine may be coupled to carrier using a linker such 
as m-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS), while other peptides may be 

25 coupled to carrier using a more general linking agent such as glutaraldehyde. Animals such as 
rabbits, rats and mice are inmiunized with either free or carrier-coupled peptides, for instance, by 
intraperitoneal and/or intradermal injection of emulsions containing about 100 |J.g peptide or 
carrier protein and Freund's adjuvant. Several booster injections may be needed, for instance, at 
intervals of about two weeks, to provide a useful titer of anti-peptide antibody which can be 

30 detected, for example, by ELISA assay using free peptide adsorbed to a solid surface. The titer 
of anti-peptide antibodies in smim from an immunized animal may be increased by selection of 
anti-peptide antibodies, for instance, by adsorption to the peptide on a solid support and elution 
of the selected antibodies according to methods well known in the art. 

Immunogenic epitope-bearing peptides of the invention, those parts of a protein that 

35 elicit an antibody response when the whole protein is the immunogen, are identified according to 
methods known in the art For instance, Geysen, et al^ supra^ discloses a procedure for rapid 
concurrent synthesis on solid supports of hundreds of peptides of sufficient purity to react in an 
EUSA, Interaction of synthesized peptides with antibodies is then easily detected without 
removing them from the support. In this manner a peptide bearing an immunogenic epitope of a 
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desired protein may be identified routinely by one of ordinary skill in the art. For instance, the 
immunologically important epitope in the coat protein of foot-and-mouth disease virus was 
located by Geysen et al supra with a resolution of seven amino acids by synthesis of an 
overlapping set of all 208 possible hexapeptides covering the entire 213 amino acid sequence of 
5 the protein. Then, a complete replacement set of peptides in which all 20 amino acids were 

substituted in turn at every position within the epitope were synthesized, and the particular amino 
acids conferring specificity for the reaction with antibody were determined. Thus, peptide 
analogs of the epitope-bearing peptides of the invention can be made routinely by this method. 
U.S. Patent No. 4.708,781 to Geysen (1987) further describes this method of identifying a 

10 peptide bearing an immunogenic epitope of a desired protein. 

Further still, U.S. Patent No. 5,194,392, to Geysen (1990), describes a general method 
of detecting or determining the sequence of monomers (amino acids or other compounds) which 
is a topological equivalent of the epitope (i.e., a "mimotope") which is complementary to a 
particular paratope (antigen binding site) of an antibody of interest. More generally, U.S. Patent 

15 No. 4,433,092, also to Geysen (1989), describes a method of detecting or determining a 

sequence of monomers which is a topographical equivalent of a ligand which is complementary 
to the ligand binding site of a particular receptor of interest. Similarly, U.S. Patent No. 
5,480,971 to Houghten, R. A. et al (1996) discloses linear C,-C,-alkyl peralkylated 
oligopeptides and sets and libraries of such peptides, as well as methods for using such 

20 oligopeptide sets and libraries for determining the sequence of a peralkylated oligopeptide that 
prefo-entially binds to an acceptor molecule of interest. Thus, non-peptide analogs of the 
epitope-bearing peptides of the invention also can be made routinely by these methods. The 
entire disclosure of each document cited in this section on "Polypeptides and Fragments" is 
hereby incorporated herein by reference. 

25 As one of skill in the an will appreciate, the polypeptides of the present invention and the 

epitope-bearing fragments th^eof described above can be combined with parts of the constant 
domain of immunoglobulins (IgG), resulting in chimeric polypeptides. These fusion proteins 
facilitate purification and show an increased half-life in vivo. This has been shown, e.^., for 
chimeric proteins consisting of the first two domains of the human OM-polypeptide and various 

30 domains of the constant regions of the heavy or light chains of mammalian immunoglobulins. 
(EPA 0,394,827; Traunecker et al. (1988) Nature 331:84-86. Fusion proteins that have a 
disulfide-linked dimeric structure due to the IgG part can also be more efficient in binding and 
neutralizing other molecules than a monomeric 7. pallidum polypeptide or fragment thereof 
alone. See Fountoulakis et al. (1995) J. Biochem. 270:3958-3964. Nucleic acids encoding die 

35 above epitopes of T, pallidum polypeptides can also be recombined with a gene of interest as an 
epitope tag to aid in detection and purification of the expressed polypeptide. 

3. Diagnostic Assays and Kits 
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The present invention further relates to methods for assaying Borrelia infection in an 
animal by detecting the expression of genes encoding Borrelia polypeptides of the present 
invention. The methods comprise analyzing tissue or body fluid from the animal for 
5 Borrelia'Specific antibodies, nucleic acids, or proteins. Analysis of nucleic acid specific to 
Borrelia is assayed by PGR or hybridization techniques using nucleic acid sequences of the 
present invention as either hybridization probes or primers. See, e.g., Sambrook et al. 
Molecular cloning: A Laboratory Manual (Cold Spring Harbor Laboratory Press, 2nd ed., 1989, 
page 54 reference); Eremeeva et al. (1994) J. Clin. Microbiol. 32:803-810 (describing 

10 differentiation among spotted fever group Rickettsiae species by analysis of restriction fragment 
length polymorphism of PCR-amplified DNA) and Chen et al. 1994 J. Clin. Microbiol. 32:589- 
595 (detecting T. pallidum nucleic acids via PCR). 

Where diagnosis of a disease state related to infection with Borrelia has already been 
made, the present invention is useful for monitoring progression or regression of the disease state 

15 whereby patients exhibiting enhanced Borrelia gene expression will experience a worse clinical 
outcome relative to patients expressing these gene(s) at a lower level. 



line, tissue culture, or other source which contains Borrelia polypeptide, mRNA, or DNA. 
Biological samples include body fluids (such as saliva, blood, plasma, urine, mucus, synovial 

20 fluid, etc.) tissues (such as muscle, skin, and cartilage) and any other biological source suspected 
of containing Borrelia polypeptides or nucleic acids. Methods for obtaining biological samples 
such as tissue are weU known in the art. 

The present invention is useful for detecting diseases related to Borrelia infections in 
animals. Preferred animals include monkeys, apes, cats, dogs, birds, cows, pigs, mice, horses, 

25 rabbits and humans. Particularly preferred are humans. 

Total RNA can be isolated from a biological sample using any suitable technique such as 
the single-step guanidinium-thiocyanate-phenol-chloroform method described in Chomczynski et 
al. (1987) Anal. Biochem. 162:156-159. mRNA encoding Borrelia polypeptides having 
sufficient homology to the nucleic acid sequences identified in SEQ ID NOS: 1-744 to allow for 

30 hybridization between complementary sequences aire then assayed using any appropriate method. 
These include Northern blot analysis, SI nuclease mapping, the polymerase chain reaction 
(PCR), reverse transcription in combination with the polymerase chain reaction (RT-PCR), and 
reverse transcription in combination with the ligase chain reaction (RT-LCR). 



35 63:303-312. Briefly, total RNA is prepared from a biological sample as described above. For 
the Northern blot, the RNA is denatured in an appropriate buffer (such as glyoxal/dimethyl 
sulfoxide/sodium phosphate buffer), subjected to agarose gel electrophoresis, and transferred 
onto a nitroceUulose filter. After the RNAs have been linked to the filter by a UV linker, the filter 
is prehybridized in a solution containing formamide, SSC, Denhardfs solution, denatured 



By '"biological sample" is intended any biological sample obtained from an animal, cell 



Northern blot analysis can be performed as described in Harada et al. (1990) Cell 
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salmon sperm, SDS, and sodium phosphate buffer. A 7. pallidum polynucleotide sequence 
shown in SEQ ID NOS: 1-744, or portion thereof, labeled according to any appropriate method 
(such as the ^^P-multiprimed DNA labeling system (Amersham)) is used as probe. After 
hybridization overnight, the filter is washed and exposed to x-ray film. DNA for use as pxobe 
5 according to the present invention is described in the sections above and will preferably at least 
15 nucleotides in length. 

SI mapping can be performed as described in Fujita et al. (1987) Cell 49:357-367. To 
prepare probe DNA for use in S 1 mapping, the sense strand of an above-described T. pallidum 
DNA sequence of the present invention is used as a template to synthesize labeled antisense 
10 DNA. The antisense DNA can then be digested using an appropriate restriction endonuclease to 
generate further DNA probes of a desired length. Such antisense probes are useful for 
visualizing protected bands corresponding to the target mRNA (/.e., mRNA encoding Borrelia 
polypeptides). 



15 RT-PCR method described in Makino et al. (1990) Technique 2:295-301 . By this method, the 
radioactivities of the "amplicons" in the polyaciylamide gel bands are lineariy related to the initial 
concentration of the target mRNA. Briefly, this method involves adding total RNA isolated from 
a biological sample in a reaction mixture containing a RT primer and appropriate buffer. After 
incubating for primer annealing, the mixture can be supplemented with a RT buffer, dNTPs, 

20 DTT, RNase inhibitor and reverse transcriptase. After incubation to achieve reverse transcription 
of the RNA, the RT products arc then subject to PGR using labeled primers. Alternatively, rather 
than labeling the primers, a labeled dNTP can be included in the PGR reaction mixture. PGR 
amplification can be performed in a DNA thermal cycler according to conventional techniques. 
After a suitable number of rounds to achieve amplification, the PGR reaction mixture is 

25 electrophoresed on a polyacrylamide gel. After drying the gel, the radioactivity of the appropriate 
bands (corresponding to the mRNA encoding the Borrelia polypeptides of the present invention) 
are quantified using an imaging analyzer. RT and PGR reaction ingredients and conditions, 
reagent and gel concentrations, and labeling methods are well known in the art. Variations on the 
RT-PC31 method will be apparent to the skilled artisan. Other PGR methods that can detect the 

30 nucleic acid of the present invention can be found in PGR PRIMER: A LABORATORY 
MANUAL (G.W. Dieffenbach et al. eds., Cold Spring Harbor Lab Press, 1995). 

The polynucleotides of the present invention, including both DNA and RNA, may be 
used to detect polynucleotides of the present invention or Borrelia species including T. pallidum 
using bio chip technology. The present invention includes both high density chip arrays (>1(XX) 

35 oligonucleotides per cm^) and low density chip arrays (<1(XX) oligonucleotides per cm^). Bio 
chips conq}rising arrays of polynucleotides of the present invention may be used to detect 
Borrelia species, including T. pallidum, in biological and envirormiental samples and to diagnose 
an animal, including humans, with an T, pallidum or other Borrelia infection. The bio chips of 
the present invention may comprise polynucleotide sequences of other pathogens including 



Levels of mRNA encoding Borrelia polypeptides are assayed, for e.g., using the 
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bacteria, viral, parasitic, and fungal polynucleotide sequences, in addition to the polynucleotide 
sequences of the present invention, for use in rapid diffenertial pathogenic detection and 
diagnosis. The bio chips can also be used to monitor an 7. pallidum or other Borrelia infections 
and to monitor the genetic changes (deletions, insertions, mismatches, etc.) in response to drug 
5 therapy in the clinic and drug development in the laboratory. The bio chip technology comprising 
arrays of polynucleotides of the present invention may also be used to simultaneously monitor the 
expression of a multiplicity of genes, including those of the present invention. The 
polynucleotides used to comprise a selected array may be specified in the same manner as for the 
fragements. i.e, by their 5' and 3' positions or length in contigious base pairs and include from. 

10 Methods and particular uses of the polynucleotides of the present invention to detect Borrelia 
species, including T. pallidum^ using bio chip technology include those known in the art and 
those of: U.S. Patent Nos. 5510270. 5545531, 5445934, 5677195, 5532128. 5556752, 
5527681, 5451683, 5424186, 5607646. 5658732 and World Patent Nos. WO/9710365, 
WO/951 1995. WO/9743447. WO/9535505, each incorporated herein in their entireties. 

15 Biosensors using the polynucleotides of the present invention may also be used to detect, 

diagnose, and monitor 7. pallidum or other Borrelia species and infections thereof. Biosensors 
using the polynucleotides of the present invention may also be used to detect particular 
polynucleotides of the present invention. Biosensors using the polynucleotides of the present 
invention may also be used to monitor the genetic changes (deletions, insertions, mismatches, 

20 etc.) in response to drug therapy in the clinic and drug development in the laboratory. Methods 
and particular uses of the polynucleotides of the present invention to detect Borrelia species, 
including T, pallidum, using biosenors include those known in the art and those of: U.S. Patent 
Nos 5721 102. 5658732, 5631 170, and World Patent Nos. WO97/3501 1, WO/9720203, each 
incorporated herein in their entireties. 

25 Thus, the present invention includes both bio chips and biosensors comprising 

polynucleotides of the present invention and methods of their use. 

Assaying Borrelia polypeptide levels in a biological san^le can occur using any 
art-known method, such as antibody-based techniques. For example, Borrelia polypeptide 
expression in tissues can be studied with classical immunohistological methods. In these, the 

30 specific recognition is provided by the primary antibody (polyclonal or monoclonal) but the 
secondary detection system can utilize fluorescent, enzyme, or other conjugated secondary 
antibodies. As a result, an iinmunohistological staining of tissue section for pathological 
examination is obtained. Tissues can also be extracted, e,g,, with urea and neutral detergent, for 
the liberation of Borrelia polypeptides for Westem-blot or dot/slot assay. See, e.g., Jalkanen, 

35 M. et al. (1985) J. Cell. Biol. 101:976-985; Jalkanen. M. et al. (1987) J. Cell . Biol. 
105:3087-3096. In this technique, which is based on the use of cationic solid phases, 
quantitation of a Borrelia polypeptide can be accomplished using an isolated Borrelia polypeptide 
as a standard. This technique can also be applied to body fluids. 



Printed from Mimosa 02/03A22 07:18:14 Page: 39 



wo 98/59034 




18/13041 



38 

Other antibody-based methods useful for detecting Borrelia polypeptide gene expression 
include immunoassays, such as the EUSA and the radioimmunoassay (RIA). For example* a 
Borrelia polypeptide-specific monoclonal antibodies can be used both as an immunoabsoibent 
and as an enzyme-labeled probe to detect and quantify a Borrelia polypeptide. The amount of a 

5 Borrelia polypeptide present in the sample can be calculated by reference to the amount present in 
a standard preparation using a linear regression computer algorithm. Such an ELISA is described 
in lacobelli et al. (1988) Breast Cancer Research and Treamient 11:19-30. In another EUSA 
assay, two distinct specitic monoclonal antibodies can be used to detect Borrelia polypeptides in a 
body fluid. In this assay, one of the antibodies is used as the immunoabsorbent and the other as 

10 the enzyme-labeled probe. 

The above techniques may be conducted essentially as a "one-step" or "two-step" assay. 
The "one-step" assay involves contacting the Borrelia polypeptide with immobilized antibody 
and, without washing, contacting the mixture with the labeled antibody. The "two-step" assay 
involves washing before contacting the mixture with the labeled antibody. Other conventional 

15 methods may also be employed as suitable. It is usually desirable to immobilize one component 
of the assay system on a support, thereby allowing other components of the system to be brought 
into contact with the component and readily removed from the sample. Variations of the above 
and other immunological methods included in the present invention can also be found in Hariow 
et al., ANTIBODIES: A LABORATORY MANUAL, (Cold Spring Haibor Laboratory Press, 

20 2nd ed. 1988). 

Suitable enzyme labels include, for example, those from the oxidase group, which 
catalyze the production of hydrogen peroxide by reacting with substrate. Glucose oxidase is 
particularly preferred as it has good stability and its substrate (glucose) is readily available. 
Activity of an oxidase label may be assayed by measuring the concentration of hydrogen peroxide 

25 formed by the enzyme-labeled antibody/substrate reaction. Besides enzymes, other suitable 

labels include radioisotopes, such as iodine ('^I, '^'I), carbon ('^C), sulphur (^^S), tritium (^H), 
indium ("^In), and technetium ('^c), and fluorescent labels, such as fluorescein and 
rhodamine, and biotin. 

Further suitable labels for the Borrelia polypeptide-specific antibodies of the present 

30 invention are provided below. Examples of suitable enzyme labels include malate 

dehydrogenase, Borrelia nuclease, delta-S-steroid isomerase, yeast-alcohol dehydrogenase, 
alpha-glycerol phosphate dehydrogenase, triose phosphate isomerase, peroxidase, alkaline 
phosphatase, asparaginase, glucose oxidase, beta-galactosidase, ribonuclease, urease, catalase, 
glucose-6-phosphate dehydrogenase, glucoamylase, and acetylcholine esterase. 

35 Examples of suitable radioisotopic labels include ^H. "'In, '"I, "*I, ^^P, ^^S. ^'C, ^*Cr, 

"To. "Co, ^^e, "Se. '"Eu, "Cu, ^"Ci, *"At, ^'^Pb, '•^Sc. '^"'Pd, etc. '"In is a preferred 
isotope where in vivo imaging is used since its avoids the problem of dehalogenation of the ^^I 
or '^*I-labeled monoclonal antibody by the liver. In addition, this radionucleotide has a more 
favorable gamma emission energy for imaging. See, e.g., Perkins et al. (1985) Eur. J. NucL 
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Med. 10:296-301; Carasquillo et al. (1987) J. Nucl. Med. 28:281-287. For example, '"In 
coupled to monoclonal antibodies with l-(P-isothiocyanatobenzyl)-DPTA has shown little uptake 
in non-tumors tissues, particularly the liver, and therefore enhances specificity of tumor 
localization. See, Esteban el al. (1987) J. Nucl. Med. 28:861-870. 
5 Examples of suitable non-radioactive isotopic labels include *"Gd, ^^Mn, *"l>y, "Tr, 

and "Fe. 

Examples of suitable fluorescent labels include an '^^Eu label, a fluorescein label, an 
isothiocyanate label, a liiodamine label, a phycoerythrin label, a phycocyanin label, an 
allophycocyanin label, an o-phthaldehyde label, and a fluorescamine label. 
10 Examples of suitable toxin labels include, Pseudomonas toxin, diphtheria toxin, ricin, 

and cholera toxin. 

Examples of chemiluminescent labels include a luminal label, an isoluminal label, an 
aromatic achdinium ester label, an imidazole label, an acridinium salt label, an oxalate ester label, 
a luciferin label, a luciferase label, and an aequorin label. 

15 Examples of nuclear magnetic resonance contrasting agents include heavy metal nuclei 

such as Gd, Mn, and iron. 

Typical techniques for binding the above-desciibed labels to antibodies are provided by 
Kennedy et al. (1976) Clin. Chim. Acta 70:1-31, and Schurs et al. (1977) Clin. Chim. Acta 
81:1 -40. Coupling techniques mentioned in the latter are the glutaraldehyde method, the 

20 periodate method, the dimaleimide method, the m-maleimidobenzyl-N-hydroxy-succinimide ester 
method, all of which methods are incorporated by reference herein. 

In a related aspect, the invention includes a diagnostic kit for use in screening serum 
containing antibodies specific against T. pallidum infection. Such a kit may include an isolated 
r. pallidum antigen comprising an epitope which is specifically immunoreactive with at least one 

25 anti-J. pallidum antibody. Such a kit also includes means for detecting the binding of said 

antibody to the antigen. In specific embodiments, the kit may include a recombinantly produced 
or chemically synthesized peptide or polypeptide antigen. The peptide or polypeptide antigen 
may be attached to a solid support 

In a more specific embodiment, the detecting means of the above-described kit includes a 

30 solid support to which said peptide or polypeptide antigen is attached. Such a kit may also 

include a non-attached reporter-labeled anti-human antibody. In this embodiment, binding of the 
andbody to the T, pallidum antigen can be detected by binding of the reporter labeled antibody to 
the anti-r. pallidum polypeptide antibody. 

Specifically, the invention provides a compartmentalized kit to receive, in close 

35 confinement, one or more contains which comprises: (a) a first container comprising one of the 
DFs or antibodies of the present invention; and (b) one or more other contains comprising one 
or more of the following: wash reagents, reagents capable of detecting presence of a bound DF or 
antibody. 
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In detail, a compartmentalized kit includes any kit in which reagents are contained in 
separate containers. Such containers include small glass containers, plastic containers or strips of 
plastic or paper. Such containers allows one to efficientJy transfer reagents from one 
compartment to another compartment such that the samples and reagents are not cross- 
5 contaminated, and the agents or solutions of each container can be added in a quantitative fashion 
from one compartment to another. Such contsuners will include a container which will accept the 
test sample, a container which contains the andbodies used in the assay, containers which contain 
wash reagents (such as phosphate buffered saline, Tris-buffers, etcX and containers which 
contain the reagents used to detect the bound antibody or DF. 



In a related aspect, the invention includes a method of detecting T, pallidum infection in a 
subject This detection method includes reacting a body fluid, preferably serum, from the subject 
with an isolated T, pallidum antigen, and examining the antigen for the presence of bound 
antibody. In a specific embodiment, the method includes a polypeptide antigen attached to a solid 
15 support, and serum is reacted with the support. Subsequently, the support is reacted with a 

reporter-labeled anti-human antibody. The support is then examined for the presence of reporter- 
labeled antibody. 

The solid surface reagent employed in the above assays and kits is prepared by known 
techniques for attaching protein material to solid support material, such as polymeric beads, dip 

20 sticks, 96-well plates or filter material. These attachment methods generally include non-specific 
adsoq>tion of the protein to the support or covalent attachment of the protein , typically through a 
free amine group, to a chemically reactive group on the solid support, such as an activated 
carboxyl, hydroxyl, or aldehyde group. Alternatively, streptavidin coated plates can be used in 
conjunction with biotinylated antigen(s). 

25 The polypeptides and antibodies of the present invention, including fragments thereof, 

may be used to detect Borrelia species including T, pallidum using bio chip and biosensor 
technology. Bio chip and biosensors of the present invention may comprise the polypeptides of 
the present invention to detect antibodies, which specifically recognize Borrelia species, including 
7. pallidum. Bio chip and biosensors of the present invention may also comprise antibodies 

30 which specifically recognize the polypeptides of the present invention to detect Borrelia species, 
including T. pallidum or specific polypeptides of the present invention. Bio chips or biosensors 
comprising polypeptides or antibodies of the present invention may be used to detect Boirelia 
species, including T. pallidum^ in biological and environmental samples and to diagnose an 
animal, including humans, with an T. pallidum or other Borrelia infection. Thus, the present 

35 invention includes both bio chips and biosensors comprising polypeptides or antibodies of the 
present invention and methods of their use. 

The bio chips of the present invention may further comprise polypeptide sequences of 
other pathogens including bacteria, viral, parasitic, and fungal polypeptide sequences, in addition 
to the polypeptide sequences of the present invention, for use in rapid diffenertial pathogenic 



10 
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detection and diagnosis. The bio chips of the present invention may further connprise antibodies 
or fragements thereof specific for other pathogens including bacteria, viral, parasitic, and fungal 
polypeptide sequences, in addition to the antibodies or fragements thereof of the present 
invention, for use in rapid diffeneitial pathogenic detection and diagnosis. The bio chips and 
5 biosensors of the present invention may also be used to monitor an T. pallidum or other Borrelia 
infection and to monitor the genetic changes (amio acid deletions, insertions, substitutions, etc.) 
in response to drug therapy in the clinic and drug development in the laboratory. The bio chip 
and biosensors comprising polypeptides or antibodies of the present invention may also be used 
to simultaneously monitor the expression of a multiplicity of polypeptides, including those of the 

10 present invention. The polypeptides used to comprise a bio chip or biosensor of the present 

invention may be specified in the same manner as for the fragements, i.e, by their N-terminal and 
C-terminal positions or length in contigious amino acid residue. Methods and particular uses of 
the polypeptides and antibodies of the present invention to detect Boirelia species, including T, 
pallidum, or specific polypeptides using bio chip and biosensor technology include those known 

15 in the art, those of the U.S. Patent Nos. and World Patent Nos. listed above for bio chips and 
biosensors using polynucleotides of the present invention, and those of: U.S. Patent Nos. 
5658732, 5135852. 5567301, 5677196, 5690894 and World Patent Nos. W09729366, 
W09612957, each incorporated herein in their entireties. 

20 4. Screening Assay for Binding Agents 

Using the isolated proteins of the present invention, the present invention further provides 
methods of obtaining and identifying agents which bind to a protein encoded by one of the ORFs 
of the present invention or to one of the fragments and the T, pallidum fragment and contigs 
herein described. 
25 In general, such methods conq>rise steps of: 

(a) contacting an agent with an isolated protein encoded by one of the ORFs of the 
present invention, or an isolated fragment of the T. pallidum genome; and 



30 carbohydrates, vitamin derivatives, or other pharmaceutical agents. The agents can be selected 
and screened at random or rationally selected or designed using protein modeling techniques. 

For random screening, agents such as peptides, carbohydrates, pharmaceutical agents and 
the like are selected at random and are assayed for their ability to bind to the protein encoded by 
the ORF of the present invention. 

35 Alternatively, agents may be rationally selected or designed. As used herein, an agent is 

said to be "rationally selected or designed" when the agent is chosen based on the configuration 
of the particular protein. For example, one skilled in the art can readily adapt currently available 
procedures to generate peptides, pharmaceutical agents and the like capable of binding to a 
specific peptide sequence in order to generate rationaUy designed antipeptide peptides, for 



(b) determining whether the agent binds to said protein or said fragment 
The agents screened in the above assay can be, but are not limited to, peptides. 
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example see Hurby et al^ "Application of Synthetic Peptides: Antisense Peptides," in Synthetic 
Peptides, A User's Guide, W. H. Freeman, NY (1992), pp. 289-307, and Kaspczak et al. 
Biochemistry 25:9230-8 (1989), or pharmaceutical agents, or the like. 

In addition to the foregoing, one class of agents of the present invention, as broadly 
5 described, can be used to control gene expression through binding to one of the ORFs or EMFs 
of the present invention. As described above, such agents can be randomly screened or rationally 
designed/selected. Targeting the ORF or EMF allows a skilled artisan to design sequence 
specific or element specific agents, modulating the expiession of either a single ORF or multiple 
ORFs which rely on the same EMF for expression control. 

10 One class of DNA binding agents are agents which contain base residues which hybridize 

or form a triple helix by binding to DNA or RNA. Such agents can be based on the classic 
phosphodiester, ribonucleic acid backbone, or can be a variety of sulfhydryl or polymeric 
derivatives which have base attachment cqiacity. 

Agents suitable for use in these methods usually contain 20 to 40 bases and are designed 

15 to be complementary to a region of the gene involved in transcription (triple helix - see Lee et aLy 
NucL Acids Res. 5:3073 (1979); Cooney et aL, Science 241:456 (1988); and Dervan era/.. 
Science 257:1360 (1991)) or to the mRNA itself (antisense - Okano, 7. Neurochenu 56:560 
(1991); Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca 
Raton, FL (1988)). Triple helix- formation optimally results in a shut-off of RNA transcription 

20 from DNA, while antisense RNA hybridization blocks translation of an mRNA molecule into 
polypeptide. Both techniques have been demonstrated to be effective in model systems. 
Information contained in the sequences of the present invention can be used to design antisense 
and triple helix-forming oligonucleotides, and other DNA binding ag€fnts. 

25 5. Pharmaceutical Compositions and Vaccines 

The present invention further provides pharmaceutical agents which can be used to 
modulate the growth or pathogenicity of 7. pallidum, or another related organism, in vivo or in 
vitro. As used herein, a "pharmaceutical agent" is defined as a composition of matter which can 
be formulated using known techniques to provide a pharmaceutical compositions. As used 

30 herein, the "pharmaceutical agents of the present invention" refers the pharmaceutical agents 

which are derived from the proteins encoded by the ORFs of the present invention or are agents 
which are identified using the herein described assays. 

As used herein, a pharmaceutical agent is said to "modulate the growth pathogenicity of 
r. pallidum or a related organism, in vivo or in vitro,*' when the agent reduces the rate of growth, 

35 rate of division, or viability of the organism in question. The pharmaceutical agents of the 
present invention can modulate the growth or pathogenicity of an organism in many fashions, 
although an understanding of the underlying mechanism of action is not needed to practice the 
use of the pharmaceutical agents of the present invention. Some agents will modulate the growth 
by binding to an important protein thus blocking the biological activity of the protein, while other 
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agents may bind to a component of the outer surface of the organism blocking attachment or 
rendering the organism more prone to act the bodies nature immune system. Alternatively, the 
agent may comprise a protein encoded by one of the ORFs of the present invention and serve as a 
vaccine. The development and use of a vaccine based on outer membrane components are weU 
5 known in the art. 

As used herein, a "related organism" is a broad term which refers to any organism whose 
growth can be modulated by one of the pharmaceutical agents of the present invention. In 
general, such an organism will contain a homolog of the protein which is the target of the 
pharmaceutical agent or the protein used as a vaccine. As such, related organisms do not need to 

10 be bacterial but may be fungal or viral pathogens. 

The pharmaceutical agents and compositions of the present invention may be administered 
in a convenient manner, such as by the oral, topical, intravenous, intraperitoneal, intramuscular, 
subcutaneous, intranasal or intradermal routes. The pharmaceutical compositions are 
administered in an amount which is effective for treating and/or prophylaxis of the specific 

15 indication. In general, they are administered in an amount of at least about 1 mg/kg body weight 
and in most cases they will be administered in an amount not in excess of about 1 g/kg body 
weight per day. In most cases, the dosage is from about 0.1 mg/kg to about 10 g/kg body 
weight daily, taking into account the routes of administration, symptoms, etc. 

The agents of the present invention can be used in native form or can be modified to form 

20 a chemical derivative. As used herein, a molecule is said to be a "chemical derivative" of another 
molecule when it contains additional chemical moieties hot normally a part of the molecule. Such 
moieties may improve the molecule's solubDity, absorption, biological half life, etc. The 
moieties may alternatively decrease the toxicity of the molecule, eliminate or attenuate any 
undesirable side effect of the molecule, etc. Moieties capable of mediating such effects are 

25 disclosed in, among other sources, REMINGTON'S PHARMACEUTICAL SCIENCES (1980) 
cited elsewhere herein. 

For example, such moieties may change an immunological character of the functional 
derivative, such as affinity for a given antibody. Such changes in immunomodulation activity are 
measured by the appropriate assay, such as a competitive type immunoassay. Modifications of 

30 such protein properties as redox or thermal stability, biological half-life, hydrophobicity, 
susceptibility to proteolytic degradation or the tendency to aggregate with carriers or into 
multimers also may be effected in this way and can be assayed by methods well known to the 
skilled artisan. 

The therapeutic effects of the agents of the present invention may be obtained by 
35 providing the agent to a patient by any suitable means inhalation, intravenously, 

intramuscularly, subcutaneously, enterally, or parenterally). It is preferred to administer the 
agent of the present invention so as to achieve an effective concentration within the blood or 
tissue in which the growth of the organism is to be controlled. To achieve an effective blood 
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concentration, the preferred method is to administer the agent by injection. The administration 
may be by continuous infusion, or by single or multiple injections. 

In providing a patient with one of the agents of the present invention, the dosage of the 
administered agent wiU vary depending upon such factors as the patient's age, weight, height, 
5 sex, general medical condition, previous medical history, etc. In general, it is desirable to 
provide the recipient with a dosage of agent which is in the range of from about 1 pg/kg to 10 
mg/kg (body weight of patient), although a lower or higher dosage may be administered. The 
therapeutically effective dose can be lowered by using combinations of the agents of the present 
invention or another agent 

10 As used herein, two or more compounds or agents are said to be administered "in 

combination'* with each other when either (1) the physiological effects of each compound, or (2) 
the serum concentrations of each compound can be measured at the same time. The composition 
of the present invention can be administered concurrently with, prior to, or following the 
administration of the other agent. 

IS The agents of the present invention are intended to be provided to recipient subjects in an 

amount sufficient to decrease the rate of growth (as defined above) of the target organism. 

The administration of the agent(s) of the invention may be for either a "prophylactic" or 
"therapeutic" purpose. When provided prophylactically, the agent(s) arc provided in advance of 
any symptoms indicative of the organisms growth. The prophylactic administration of the 

20 agent(s) serves to prevent, attenuate, or decrease the rate of onset of any subsequent infection. 
When provided therapeutically, the agent(s) are provided at (or shortly after) the onset of an 
indication of infection. The therapeutic administration of the compound(s) serves to attenuate the 
pathological sjmiptoms of the infection and to increase the rate of recovery. 

The agents of the present invention are administered to a subject, such as a mammal, or a 

25 patient, in a phamiaceutically acceptable form and in a therapeutically effective concentration. A 
composition is said to be "phaimacologically acceptable" if its administration can be tolerated by a 
recipient patient. Such an agent is said to be administered in a "therapeutically effective amount" 
if the amount administered is physiologically significant. An agent is physiologically significant 
if its presence results in a detectable change in the physiology of a recipient patient. 

30 The agents of the present invention can be formulated according to known methods to 

prepare pharmaceutically useful compositions, whereby these materials, or their functional 
derivatives, are combined in a mixture with a pharmaceutically acceptable carrier vehicle. 
Suitable vehicles and their formulation, inclusive of other human proteins, e.g., human serum 
albumin, are described, for example, in REMINGTON'S PHARMACEUTICAL SCIENCES, 

35 16th Ed., Osol, A., Ed., Mack Publishing, Easton PA (1980). In order to form a 

pharmaceutically acceptable composition suitable for effective administration, such compositions 
will contain an effective amount of one or more of the agents of the present invention, together 
with a suitable amount of carrier vehicle. 
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Additional pharmaceutical methods may be employed to control the duration of action. 
Control release preparations may be achieved through the use of polymers to complex or absorb 
one or more of the agents of the present invention. The controlled delivery may be effectuated by 
a variety of well known techniques, including formulation with macromolecules such as, for 
example, polyesters, polyamino acids, polyvinyl, pyrrolidone, ethylenevinylacetate, 
methylcellulose, carboxymethylcellulose, or protamine, sulfate, adjusting the concentration of the 
macromolecules and the agent in the formulation, and by appropriate use of methods of 
incorporation, which can be manipulated to effectuate a desired time course of release. Another 
possible method to control the duration of action by controlled release preparations is to 
incoiporate agents of the present invention into particles of a polymeric material such as 
polyesters, polyamino acids, hydrogels, poly(lactic acid) or ethylene vinylacetate copolymers. 
Alternatively, instead of incorporating these agents into polymeric particles, it is possible to 
entrap these materials in microcapsules prepared, for example, by coacervation techniques or by 
interfacial polymerization with, for example, hydroxymethylcellulose or gelatine-microcapsules 
and poiy(methylmethacylate) microcapsules, respectively, or in colloidal drug delivery systems, 
for example, liposomes, albumin microspheres, microemulsions, nanoparticles, and 
nanocapsules or in macroemulsions. Such techniques are disclosed in REMINGTON'S 
PHARMACEUTICAL SCIENCES ( 1 980). 

The invention further provides a pharmaceutical pack or kit comprising one or more 
containers filled with one or more of the ingredients of the pharmaceutical compositions of the 
invention. Associated with such container(s) can be a notice in the form prescribed by a 
governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological 
products, which notice reflects approval by the agency of manufacture, use or sale for human 
administration. 

In addition, the agents of the present invention may be employed in conjunction with 
oth^ therapeutic compounds. 

6. Shot-Gun Approach to Megabase DNA Sequencing 

The present invention further demonstrates that a large sequence can be sequenced using a 
random shotgun approach. This procedure, described in detail in the examples that foUow, has 
eliminated the up front cost of isolating and ordering overlapping or contiguous subclones prior 
to the start of the sequencing protocols. 

Certain aspects of the present invention are described in greater detail in the examples that 
follow. The examples are provided by way of illustration. Other aspects and embodiments of 
the present invention are contemplated by the inventors, as will be clear to those of skill in the art 
from reading the present disclosure. 



ILLUSTRATI VE EXAMPLES 
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LIBRARIES AND SEQUENCING 

1. Shotgun Sequencing Probability Analysis 

The overall strategy for a shotgun approach to whole genome sequencing follows from 
5 the Lander and Waterman (Landerman and Waterman, Genomics 2:23 1 (1988)) application of the 
equation for the Poisson distribution. According to this treatment, the probabUity, PO, that any 
given base in a sequence of size L, in nucleotides, is not sequenced after a certain amount, n, in 
nucleotides, of random sequence has been determined can be calculated by the equation PO = e- 
m, where m is L/n, the fold coverage. For instance, for a genome of 2.8 Mb, m=l when 2.8 
10 Mb of sequence has been randomly generated (IX coverage). At that point, P0 = e-1 = 0.37. 

The probability that any given base has not been sequenced is the same as the probability that any 
region of the whole sequence L has not been determined and, therefore, is equivalent to the 
fraction of the whole sequence that has yet to be determined. Thus, at one- fold coverage, 
approximately 37% of a polynucleotide of size L, in nucleotides has not been sequenced. When 
15 14 Mb of sequence has been generated, coverage is 5X for a 2.8 Mb and the unsequenced 
fraction drops to .0067 or 0.67%. 5X coverage of a 2.8 Mb sequence can be attained by 
sequencing approximately 17,000 random clones from both insert ends with an average sequence 
read length of 410 bp. 

Similarly, the total gap length, G, is determined by the equation G = Le-m, and the 
20 average gap size, g, follows the equation, g = L/n. Thus, 5X coverage leaves about 240 gaps 
averaging about 82 bp in size in a sequence of a polynucleotide 2.8 Mb long. 

The treatment above is essentially that of Lander and Waterman, Genomics^: 231 
(1988). 

25 2. Random Library Construction 

In order to approximate the random model described above during actual sequencing, a 
nearly ideal library of cloned genomic fragments is required. The following library construction 
procedure was developed to achieve this end. 

71 pallidum DNA is prepared by phenol extraction. A mixture containing 200 jxg DNA in 
30 1.0 ml of 300 mM sodium acetate, 10 mM Tris-HQ, 1 mM Na-EDTA, 50% glycerol is 

processed through a nebulizer (IPI Medical Products) with a stream of nitrogen adjusted to 35 
Kpa for 2 minutes. The sonicated DNA is ethanol precipitated and redissolved in 500 ^1 TE 
buffer. 

To create blunt-ends, a 100 jil aliquot of the resuspended DNA is digested with 5 units of 
35 BAL3 1 nuclease (New England BioLabs) for 10 min at 30°C in 200 ^1 BAL31 buffer. The 
digested DNA is phenol-extracted, ethanol-precipitated, redissolved in 100 |Xl TE buffer, and 
then size-fractionated by electrophoresis through a 1.0% low melting temperature agarose gel. 
The section containing DNA fragments 1.6-2.0 kb in size is excised firom the gel, and the LGT 
agarose is melted and the resulting solution is extracted with phenol to separate the agarose from 
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the DNA. DNA is ethanol precipitated and redissolved in 20 of TE buffer for ligation to 
vector. 

A two-step ligation procedure is used to produce a plasmid library with 97% inserts, of 
which >99% were single inserts. The first ligation mixture (50 ul) contains 2 ^g of DNA 
5 fragments, 2 \ig pUClS DNA (Pharmacia) cut with Smal and dephosphorylated with bacterial 
alkaline phosphatase, and 10 units of T4 ligase (GIBCO/BRL) and is incubated at 14°C for 4 hr. 
The ligation mixture then is phenol extracted and ethanol precipitated, and the precipitated DNA is 
dissolved in 20 \i\ TE buffer and electrophoresed on a 1 .0% low melting agarose gel. Discrete 
bands in a ladder are visualized by ethidium bromide-staining and UV illumination and identified 

10 by size as insert (I), vector (v), v+I, v+2i, v+3i, etc. The portion of the gel containing v+I DNA 
is excised and the v+I DNA is recovered and resuspended into 20 (il TE. The v+I DNA then is 
blunt-ended by T4 polymerase treatment for 5 min. at 37°C in a reaction mixture (50 ul) 
containing the v+I linears. 500 pM each of the 4 dNTPs, and 9 units of T4 polymerase (New 
England BioLabs), under recommended buffer conditions. After phenol extraction and ethanol 

15 precipitation the repaired v+I linears are dissolved in 20 ^1 TE. The final ligation to produce 
circles is canied out in a 50 pi reaction containing 5 pi of v+I linears and 5 units of T4 ligase at 
14°C overnight After 10 min. at 70**C the following day, the reaction mixture is stored at -20**C. 

This two-stage procedure results in a molecularly random collection of single-insert 
plasmid recombinants with minimal contamination from double-insert chimeras (<1%) or free 

20 vector (<3%). 

Since deviation from randomness can arise from propagation the DNA in the host, E. coli 
host cells deficient in all recombination and restriction functions (A. Greener, Strategies 3 (I):5 
(1990)) are used to prevent rearrangements, deletions, and loss of clones by restriction. 
Furthermore, transformed cells are plated directly on antibiotic diffusion plates to avoid the usual 

25 broth recovery phase which allows multiplication and selection of the most r^idly growing cells. 
Plating is carried out as follows. A 100 ^ aliquot of Epicurian Coli SURE II 
Supercompetent Cells (Stratagene 200152) is thawed on ice and transferred to a chilled Falcon 
2059 tube on ice. A 1 .7 ^1 aliquot of 1 .42 M beta-mercaptoethanol is added to the aliquot of ceUs 
to a final concentration of 25 mM. Cells are incubated on ice for 10 min. A 1 pi aliquot of the 

30 final ligation is added to the cells and incubated on ice for 30 min. The cells are heat pulsed for 
30 sec. at 42*'C and placed back on ice for 2 min. The outgrowth period in liquid culture is 
eliminated from this protocol in order to minimize the preferential growth of any given 
transformed cell. Instead the transformation mixture is plated directly on a nutrient rich SOB 
plate containing a 5 ml bottom layer of SOB agar (5% SOB agan 20 g tryptone, 5 g yeast extract, 

35 0.5 g NaCl. 1 .5% Difco Agar per liter of media). The 5 ml bottom layer is supplemented with 
0.4 ml of 50 mg/ml ampicillin per 100 ml SOB agar. The 15 ml top layer of SOB agar is 
supplemented with 1 ml X-Gal (2%), 1 ml MgC12 (1 M), and 1 ml MgSO4/100 ml SOB agar. 
The 15 ml top layer is poured just prior to plating. Our titer is approximately 100 colonies/10 pi 
aliquot of transformation. 
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All colonies are picked for template preparation regardless of size. Thus, only clones lost 
due to "poison" DNA or deleterious gene products are deleted from the library, resulting in a 
slight increase in gap number over that expected. 



High quality double stranded DNA plasmid templates are prepared using a "boiling bead" 
method developed in collaboration with Advanced Genetic Technology Corp. (Gaithersburg, 
MD) (Adams et al. Science 252:1651 (1991); Adams et al. Nature 555:632 (1992)). Plasmid 
preparation is performed in a 96-well format for all stages of DNA preparation from bacterial 

10 growth through final DNA purification. Template concentration is determined using Hoechst 

Dye and a Millipoie Cytofluor. DNA concentrations are not adjusted, but low-yielding templates 
are identified where possible and not sequenced. 

Templates are also prepared from two T. pallidum lambda genomic libraries. An 
amplified library is constructed in the vector Lambda GEM- 12 (Promega) and an unamplified 

15 libraiy is constructed in Lambda DASH II (Stratagene). In particular, for the unamplified lambda 
library, T, pallidum DNA (> 100 kb) is partially digested in a reaction mixture (200 ul) 
containing 50 )Xg DNA. IX Sau3AI buffer. 20 units SauBAI for 6 min. at 23**C. The digested 
DNA was phenol-extracted and elecirophoresed on a 0.5% low melting agarose gel at 2V/cm for 
7 hours. Fragments from 15 to 25 kb are excised and recovered in a final volume of 6 ul. One 

20 pi of fragments is used with 1 pi of DASHII vector (Stratagene) in the reconmiended ligation 
reaction. One ^1 of the ligation mixture is used p^ packaging reaction following the 
reconmiended protocol with the Gigapack II XL Packaging Extract (Stratagene, #22771 1). 
Phage are plated direcUy without amplification from the packaging mixture (after dilution with 
500 pi of reconmiended SM buffer and chloroform U^atment). Yield is about 2.5x103 pfu/uL 

25 The amplified library is prepared essentially as above except the lambda GEM- 12 vector is used. 
After packaging, about 3.5x104 pfii are plated on the restrictive NM539 host. The lysate is 
harvested in 2 ml of SM buffer and stored frozen in 7% dimethylsulfoxide. The phage titer is 
approximately 1x109 pfu/ml. 



30 unamplified libraiy) and template is prepared by long-range PGR using T7 and T3 vector-specific 
primers. 

Sequencing reactions are carried out on plasmid and/or PGR templates using the AB 
Catalyst LabStation with Applied Biosystems PRISM Ready Reaction Dye Primer Cycle 
Sequencing Kits for the M13 forward (M13-21) and the M13 reverse (M13RP1) primers (Adams 
35 et al,. Nature 368:414 (1994)). Dye terminator sequencing reactions are carried out on the 

lambda templates on a Perkin-Elmer 9600 Thermocycier using the Applied Biosystems Ready 
Reaction Dye Terminator Cycle Sequencing kits. T7 and SP6 primers are used to sequence the 
ends of the inserts from the Lambda GEM- 1 2 library and T7 and T3 primers are used to sequence 
the ends of the inserts from the Lambda DASH II library. Sequencing reactions are performed 



5 



3. Random DNA Sequencing 



Liquid ly sates (100 |il) are prepared from randomly selected plaques (from the 
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by eight individuals using an average of fourteen AB 373 DNA Sequencers per day. All 
sequencing reactions are analyzed using the Stretch modification of the AB 373, primarily using a 
34 cm well-to-read distance. The overall sequencing success rate very approximately is about 
85% for M13-21 and M13RP1 sequences and 65% for dye-terminator reactions. The average 
5 usable read length is 485 bp for M13-21 sequences. 445bp for M13RP1 sequences, and 375 bp 
for dye-terminator reactions. 

Richards et al. Chapter 28 in AUTOMATED DNA SEQUENCDsIG AND ANALYSIS, 
M. D. Adams, C. Fields, J. C. Venter, Eds., Academic Press, London, (1994) described the 
value of using sequence from both ends of sequencing templates to facilitate ordering of contigs 

10 in shotgun assembly projects of lambda and cosmid clones. We balance the desirability of both- 
end sequencing (including the reduced cost of lower total number of templates) against shorter 
read-lengths for sequencing reactions performed with the M13RP1 (reverse) primer compared to 
the Ml 3-21 (forward) primer. Approximately one-half of the templates are sequenced from both 
ends. Random reverse sequencing reactions are done based on successful forward sequencing 

15 reactions. Some M13RP1 sequences are obtained in a semi-directed fashion: M13-21: sequences 
pointing outward at the ends of contigs are chosen for M13RP1 sequencing in an effort to 
specifrcally order contigs. 

4. Protocol for Automated Cycle Sequencing 

20 The sequencing is carried out using ABI Catalyst robots and AB 373 Automated DNA 

Sequencers. The Catalyst robot is a publicly available sophisticated pipetting and temperature 
control robot which has been developed specifically for DNA sequencing reactions. The Catalyst 
combines pre-aliquoted templates and reaction mixes consisting of deoxy- and 
dideoxynucleotides, the thermostable Taq DNA polymerase, fluorescently-labelled sequencing 

25 primers, and reaction buffer. Reaction mixes and templates are combined in the wells of an 

aluminum 96-well thermocycling plate. Thirty consecutive cycles of linear ampliHcation (Le... 
one primer synthesis) steps are performed including denaturation, annealing of primer and 
template, and extension; i.e., DNA synthesis. A heated lid with rubber gaskets on the 
thermocycling plate prevents evaporation without the need for an oil overlay. 

30 Two sequencing protocols are used: one for dye-labelled primers and a second for dye- 

labelled dideoxy chain terminators. The shotgun sequencing involves use of four dye-labelled 
sequencing primers, one for each of the four terminator nucleotide. Each dye-primer is labelled 
with a different fluorescent dye, permitting the four individual reactions to be combined into one 
lane of the 373 DNA Sequencer for electrophoresis, detection, and base-calling. ABI currenUy 

35 supplies pre-mixed reaction mixes in bulk packages containing all the necessary non-template 
reagents for sequencing. Sequencing can be done with both plasmid and PCR- generated 
templates with both dye-primers and dye- terminators with approximately equal fidelity, although 
plasmid templates generally give longer usable sequences. 



Printed from Mimosa 02/03/22 07:18:25 Page: 51 



wo 98/59034 




198/1 3041 



SO 



Thirty-two reactions are loaded per AB373 Sequencer each day, for a total of 960 
samples. Electrophoresis is run overnight following the manufacturer's protocols, and the data is 
collected for twelve hours. Following electrophoresis and fluorescence detection, the ABI 373 
performs automatic lane tracking and base-calling. The lane-tracking is confirmed visually. Each 
5 sequence electropherogram (or fluorescence lane trace) is inspected visually and assessed for 
quality. Trailing sequences of low quality are removed and the sequence itself is loaded via 
software to a Sybase database (archived daily to 8mm tape). Leading vector polylinker sequence 
is removed automatically by a software program. Average edited lengths of sequences from the 
standard ABI 373 are around 400 bp and depend mostly on the quality of the template used for 
10 the sequencing reaction. ABI 373 Sequencers converted to Stretch Liners provide a longer 
electrophoresis path prior to fluorescence detection and increase the average number of usable 
bases to 500-600 bp. 

INFORMATICS 

15 1. Data Management 

A number of information management systems for a large-scale sequencing lab have been 
developed. (For review see, for instance, Kerlavage et al. Proceedings of the Twenty-Sixth 
Annual Hawaii International Conference on System Sciences^ IEEE Computer Society Press, 
Washington D. C, 585 (1993)) The system used to collect and assemble the sequence data was 

20 developed using the Sybase relational database management system and was designed to 
automate data flow wherever possible and to reduce user errorl The database stores and 
coirelates all information collected during the entire operation from template preparation to final 
analysis of the genome. Because the raw output of the ABI 373 Sequencers was based on a 
Macintosh platform and the data management system chosen was based on a Unix platform, it 

25 was necessary to design and implement a variety of multi- user, client-server applications which 
allow the raw data as well as analysis results to flow seamlessly into the database with a 
minimum of user effort. 

2. Assembly 

30 An assembly engine (TIGR Assembler) developed for the rapid and accurate assembly of 

thousands of sequence fragments was employed to generate contigs. The TIGR assembler 
simultaneously clusters and assembles fragments of the genome. In order to obtain the speed 
necessary to assemble more than 104 fragments, the algorithm builds a hash table of 12 bp 
oligonucleotide subsequences to generate a list of potential sequence fragment overlaps. The 

35 number of potential overiaps for each fragment determines which fragments are likely to fall into 
repetitive elements. Beginning with a single seed sequence fragment, TIGR Assembler extends 
the cuirent contig by attempting to add the best matching fragment based on oligonucleotide 
content. The contig and candidate fragment are aligned using a modifled version of the Smith- 
Waterman algorithm which provides for optimal gapped alignments (Wateiman, M. S., Methods 
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in Enzymology 164:165 (1988)). The conlig is extended by the fragment only if strict criteria for 
the quality of the match are met. The match criteria include the minimum length of overlap, the 
maximum length of an unmatched end, and the minimum percentage match. These criteria are 
automatically lowered by the algorithm in regions of minimal coverage and raised in regions with 
5 a possible repetitive element. The number of potential overlaps for each fragment determines 

which fragments are likely to fall into repetitive elements. Fragments representing the boundaries 
of repetitive elements and potentially chimeric fragments are often rejected based on partial 
mismatches at the ends of alignments and excluded from the current contig. TIGR Assembler is 
designed to take advantage of clone size information coupled with sequencing from both ends of 
10 each template. It enforces the constraint that sequence fragments from two ends of the same 
template point toward one another in the contig and arc located within a certain range of base 
pairs (definable for each clone based on the known clone size range for a given library). 
The process resulted in 744 contigs as represented by SEQ ID NOs: 1*744. 

15 3. Identifying Genes 

The predicted coding regions of the T, pallidum genome were initially defined with the 
program GeneMark, which finds ORFs using a probabilistic classification technique. The 
predicted coding region sequences were used in searches against a database of all nucleotide 
sequences from GenBank (June, 1997), using the BLASTN search method to identify overlaps 

20 of 50 or more nucleotides with at least a 95% identity. Those ORFs with nucleotide sequence 
matches are shown in Table 1. The ORFs without such matches were translated to protein 
sequences and compared to a non-redundant database of known proteins generated by combining 
the Swiss-prot, PIR and GenPept databases. ORFs that matched a database protein with 
BLAST? probability less than or equal to 0.01 are shown in Table 2. The table also lists 

25 assigned functions based on the closest match in the databases. ORFs that did not match protein 
or nucleotide sequences in the databases at these levels are shown in Table 3. 



ILLUSTRATIVE APPLICATIONS 

1. Production of an Antibody to a T. pallidum Protein 

30 Substantially pure protein or polypeptide is isolated from the transfected or transformed 

cells using any one of the methods known in the art Tlie protein can also be produced in a 
recombinant prokaryotic expression system, such as E, colU or can be chemically synthesized. 
Concentration of protein in the final preparation is adjusted, for example, by concentration on an 
Amicon filter device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibody to 

35 the protein can then be prepared as foUows. 

2. Monoclonal Antibody Production by Hybridoma Fusion 

Monoclonal antibody to epitopes of any of the peptides identified and isolated as 
described can be prepared from murine hybridomas according to the classical method of Kohler. 
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G. and Milstein, C, Nature 256:495 (1975) or modifications of the methods thereof. Briefly, a 
mouse is repetitively inoculated with a few micrograms of the selected protein over a period of a 
few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen 
isolated. The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, 
5 and the excess unfused cells destroyed by growth of the system on selective media comprising 
anninopterin (HAT media). The successfully fused cells are diluted and aliquots of the dilution 
placed in wells of a microtiter plate where growth of the culture is continued. Antibody- 
producing clones are identified by detection of antibody in the supernatant fluid of the wells by 
immunoassay procedures, such as ELISA, as originally described by Engvall, E., Meth, 
10 Enzymol 70:419 (1980), and modified methods thereof. Selected positive clones can be 
expanded and their monoclonal antibody product harvested for use. Detailed procedures for 
monoclonal antibody production are described in Davis, L. et al, Basic Methods in Molecular 
Biology, Elsevier, New York. Section 21-2 (1989). 

15 3. Polyclonal Antibody Production by Immunization 

Polyclonal antiserum containing antibodies to heterogenous epitopes of a single protein 
can be prepared by immunizing suitable animals with the expressed protein described above, 
which can be unmodified or modified to enhance immunogenicity. Effective polyclonal antibody 
production is affected by many factors related both to the antigen and the host species. For 

20 example, small molecules tend to be less immunogenic than others and may require the use of 
carriers and adjuvant. Also, host animals vary in response to site of inoculations and dose, with 
both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng 
level) of antigen administered at multiple inU-adermal sites appears to be most reliable. An 
effective inmiunization protocol for rabbits can be found in Vaitukaitis, J. et aL, J. Clin, 

25 Endocrinol. Metab. 33:988-991 (1971). 

Booster injections can be given at regular intervals, and antiserum harvested when 
antibody titer thereof, as determined semi-quantitatively. for example, by double 
immunodiffusion in agar against known concentrations of the antigen, begins to fall. See, for 
example. Ouchteriony, O. et al. Chap. 19 in: Handbook of Experimental Immunology, Wier. 

30 D., ed, Blackwell (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 
mg/ml of serum (about 12M). Affinity of the antisera for the antigen is determined by preparing 
competitive binding curves, as described, for example, by Fisher, D., Chap. 42 in: Manual of 
Clinical Immunology, second edition. Rose and Friedman, eds., Amer. Soc. For Microbiology, 
Washington, D. C. (1980) 

35 Antibody preparations prepared according to either protocol are useful in quantitative 

immunoassays which determine concentrations of antigen-bearing substances in biological 
samples; they are also used semi- quantitatively or qualitatively to identify the presence of antigen 
in a biological sample. In addition, antibodies are useful in various animal models of 
pneumococcal disease as a means of evaluating the protein used to make the antibody as a 
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potential vaccine target or as a means of evaluating the antibody as a potential immunotherapeutic 
or immunoprophylactic reagent. 

4. Preparation of PCR Primers and Amplification of DNA 

5 Various fragments of the J. pallidum genome, such as those of Tables 1-3 and SEQ ID 

NOS: 1-744 can be used, in accordance with the present invention, to prepare PCR primers for a 
variety of uses. The PCR primers are preferably at least 15 bases, and more preferably at least 
18 bases in length. When selecting a primer sequence, it is preferred that the primer pairs have 
approximately the same G/C ratio, so that melting temperatures are approximately the same. The 
10 PCR primers and amplified DNA of this Example find use in the Examples that follow. 

5. Isolation of a Selected DNA Clone From T. pallidum 

Three approaches are used to isolate a 7. pallidum clone comprising a polynucleotide of 
the present invention from any T. pallidum genomic DNA library. The T. pallidum strain 

15 B31PU has been deposited as a convienent source for obtaining a T. pallidum strain although a 
wide varity of strains 7. pallidum strains can be used which are known in the art. 

r. pallidum genomic DNA is prepared using the following method. A 20ml overnight 
bacterial culture grown in a rich medium (e.g., Trypticase Soy Broth, Brain Heart Infusion broth 
or Super broth), pelleted, ished two umes with TES (30mM Tris-pH 8.0. 25mM EDTA. 50mM 

20 NaCl), and resuspended in 5ml high salt TES (2.5M NaCl). Lysostaphin is added to final 
concentration of approx 50ug/ml and the mixture is rotated slowly 1 hour at 37C to make 
protoplast cells. The solution is then placed in incubator (or place in a shaking water bath) and 
warmed to 55C. Five hundred micro liter of 20% sarcosyl in TES (final concentration 2%) is 
then added to lyse the cells. Next, guanidine HCl is added to a final concentration of 7M (3.69g 

25 in 5.5 ml). The mixture is swirled slowly at 55C for 60-90 min (solution should clear). A CsCl 
gradient is then set up in SW41 ultra clear tubes using 2.0ml 5.7M CsQ and overlaying with 
2.85M CsCl. The gradient is carefully overlayed with the DNA-containing GuHCl solution. 
The gradient is spun at 30,000 rpm, 20C for 24 hr and the lower DNA band is collected. The 
volume is increased to 5 ml with TE buffer. The DNA is then treated with protease K (10 ug/ml) 

30 overnight at 37 C, and precipitated with ethanol. The precipitated DNA is resuspended in a 
desired buffer. 

In the first method, a plasmid is directly isolated by screening a plasmid T. pallidum 
genomic DNA library using a polynucleotide probe corresponding to a polynucleotide of the 
present invention. Particularly, a specific polynucleotide with 30-40 nucleotides is synthesized 
35 using an Applied Biosystems DNA synthesizer according to the sequence reported. The 

oligonucleotide is labeled, for instance, with ^^P-y-ATP using T4 polynucleotide kinase and 

purified according to routine methods. (See, e,g.^ Maniatis el al.. Molecular Cloning: A 
Laboratory Manual, Cold Spring Harbor Press, Cold Spring, NY (1982).) The library is 
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transfonned into a suitable host, as indicated above (such as XL-l Blue (Stratagene)) using 
techniques known to those of skill in the ait. See, e.g.^ Sambrook et al. MOLECXJLAR 
CLONING: A LABORATORY MANUAL (Cold Spring Harbor. N.Y. 2nd ed. 1989); Ausubel 
et al., CURRENT PROTOCALS IN MOLECXJLAR BIOLOGY (John Wiley and Sons. N.Y. 
5 1989). The transformants are plated on 1.5% agar plates (containing the appropriate selection 
agent, e.g., ampicillin) to a density of about 150 transformants (colonies) per plate. These plates 
are screened using Nylon membranes according to routine methods for bacterial colony 
screening. See, e,g,, Sambrook et al. MOLECULAR CLONING: A LABORATORY 
MANUAL (Cold Spring Harbor, N.Y. 2nd ed. 1989); Ausubel et al., CURRENT PROTOCALS 

10 IN MOLECULAR BIOLOGY (John Wiley and Sons. N.Y. 1989) or other techniques known to 
those of skill in the art. 

Alternatively, two primers of 15-25 nucleotides derived from the 5' and 3* ends of a 
polynucleotide of SEQ ID NOS: 1-744 are synthesized and used to amplify the desired DNA by 
PCR using a T, pallidum genomic DNA prep as a template. PCR is carried out under routine 

15 conditions, for instance, in 25 ^1 of reaction mixture with 0.5 ug of the above DNA template. A 
convenient reaction mixture is 1.5-5 mM MgCl,, 0.01% (w/v) gelatin, 20 ^M each of dATP, 
dCTP, dGTP, dTTP. 25 pmol of each primer and 0.25 Unit of Taq polymerase. Thirty five 

cycles of PCR (denaturation at 94*^0 for 1 min; annealing at 55''C for 1 min; elongation at 72''C 

for 1 min) are performed with a Perkin-Elmer Cetus automated thermal cycler. The amplified 
20 product is analyzed by agarose gel electrophoresis and the DNA band with expected molecular 

weight is excised and purified. The PCR product is verified to be the selected sequence by 

subcloning and sequencing the DNA product. 

Finally, overlapping oligos of the DNA sequences of SEQ ID NOS: 1-744 can be 

chemically synthesized and used to generate a nucleotide sequence of desired length using FCR 
25 methods known in the arL 

6(a). Expression and Purification Borrelia polypeptides in E. coli 

The bacteria] expression vector pQE60 is used for bacterial expression of some of the 
polypeptide fragements of the present invention. (QIAGEN, Inc., 9259 Eton Avenue, 

30 Chatsworth, CA, 91311). pQE60 encodes ampicillin antibiotic resistance ("Ampr") and contains 
a bacterial origin of replication ("ori"), an IFTG inducible promoter, a ribosome binding site 
("RBS*'), six codons encoding histidine residues that allow affinity purification using nickel- 
nitrilo-tri-acetic acid ("Ni-NTA") affinity resin (QIAGEN, Inc., supra) and suitable single 
restriction enzyme cleavage sites. These elements are arranged such that an inserted DNA 

35 fragment encoding a polypeptide expresses that polypeptide with the six His residues (i.e., a "6 
X His tag") covalently linked to the caii>oxyl terminus of that polypeptide. 

The DNA sequence encoding the desired portion of a T, pallidum protein of the present 
invention is amplified from J. pallidum genomic DNA using PCR oligonucleotide primers 
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which anneal to the 5* and 3' sequences coding for the portions of the T. pallidum polynucleotide 
shown in SEQ ID NOS: 1-744. Additional nucleotides containing restriction sites to facilitate 
cloning in the pQE60 vector are added to the 5' and 3' sequences, respectively. 

For cloning the mature protein, the 5' primer has a sequence containing an appropriate 

5 restriction site followed by nucleotides of the amino terminal coding sequence of the desired T, 
pallidum polynucleotide sequence in SEQ ID NOS: 1-744. One of ordinary skill in the art would 
appreciate that the point in the protein coding sequence where the 5' and 3' primers begin may be 
varied to amplify a DNA segment encoding any desired portion of the complete protein shorter or 
longer than the mature form. The 3' primer has a sequence containing an appropriate restriction 

10 site followed by nucleotides complementary to the 3* end of the polypeptide coding sequence of 
SEQ ID NOS: 1-744, excluding a stop codon. with the coding sequence aligned with the 
restriction site so as to maintain its reading frame with that of the six His codons in the pQE60 
vector. 

The amplified T, pallidum DNA fragment and the vector pQE60 are digested with 
15 restriction enzymes which recognize the sites in the primers and the digested DNAs are then 
ligated together. The T, pallidum DNA is inserted into the restricted pQE60 vector in a maimer 
which places the T, pallidum protein coding region downstream from the IPTG-inducible 
promoter and in-frame with an initiating AUG and the six histidine codons. 

The ligation mixture is transformed into competent E. coli cells using standard procedures 
20 such as those described by Sambrook et al., supra,, E. coli strain M15/rep4, containing multiple 
copies of the plasmid pR£P4, which expresses the lac repressor and confers kanamycin 
resistance ("Kanr"), is used in carrying out the illustrative example described herein. This strain, 
which is only one of many that are suitable for expressing a T. pallidum polypeptide, is available 
commercially (QIAGEN, Inc., supra). Transformants are identified by their ability to grow on 
25 LB agar plates in the presence of ampicillin and kanamycin. Plasmid DNA is isolated from 

resistant colonies and the identity of the cloned DNA confirmed by restriction analysis, PGR and 
DNA sequencing. 

Clones containing the desired constructs are grown overnight ("O/N") in liquid culture in 
LB media supplemented with both ampicillin (100 p,g/ml) and kanamycin (25 ^g/ml). The O/N 
30 culture is used to inoculate a large culture, at a dilution of approximately 1:25 to 1 :250. The cells 

are grown to an optical density at 600 nm ("OD600") of between 0.4 and 0.6, Isopropyl-P-I> 

thiogalactopyranoside ("IPTG") is then added to a final concentration of 1 mM to induce 
transcription from the lac repressor sensitive promoter, by inactivating the lad repressor. Cells 
subsequently are incubated further for 3 to 4 hours. Cells then are harvested by centrifugation. 

35 The cells are then stirred for 3-4 hours at 4''C in 6M guanidine-HCl, pH 8. The cell 

debris is removed by centrifugation, and the supernatant containing the T, pallidum polypeptide 
is loaded onto a nickel-niirilo-tri-acetic acid ("Ni-NTA") affinity resin column (QIAGEN, Inc., 
supra). Proteins with a 6 x His tag bind to the Ni-NTA resin with high affinity are purified in a 
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simple one-step procedure (for details see: The QIAexpressionist, 1995, QIAGEN, Inc., supra). 
Briefly the supernatant is loaded onto the column in 6 M guanidine*HCl, pH 8, the column is 
first washed with 10 volumes of 6 M guanidine-HCl, pH 8, then washed with 10 volumes of 6 
M guanidine-HCl pH 6, and fmally the T. pallidum polypeptide is eluted with 6 M guanidine- 
5 HCl, pH 5. 

The purified protein is then renatured by dialyzing it against phosphate-buffered saline 
(PBS) or 50 mM Na-acetate, pH 6 buffer plus 2(X) mM NaCl. Alternatively, the protein could be 
successfully refolded while immobilized on the Ni-NTA column. The recommended conditions 
are as follows: renature using a linear 6M-1M urea gradient in 500 mM NaCl, 20% glycerol, 20 
10 mM Tris/HCl pH 7.4, containing protease inhibitors. The renaturaiion should be performed over 
a period of 1 .5 hours or more. After renaturation the proteins can be eluted by the addition of 
250 mM immidazole. Inmiidazole is removed by a final dialyzing step against PBS or 50 mM 
sodium acetate pH 6 buffer plus 2(X) mM NaCl. The purifled protein is stored at 4° C or frozen at 
-80**C. 

15 The polypeptide of the present invention are also prepared using a non-denaturing protein 

puriflcation method. For these polypeptides, the cell pellet from each liter of culture is 
resuspended in 25 mis of Lysis Buffer A at 4°C (Lysis Buffer A = 50 mM Na-phosphate, 3(X) 
mM NaQ, 10 mM 2-mercaptoethanol, 10% Glycerol, pH 7.5 with 1 tablet of Complete EDTA- 
free protease inhibitor cocktail (Boehringer Mannheim #1873580) per 50 ml of buffer). 

20 Absorbance at 550 nm is approximately 10-20 O.Dyml. The suspension is then put through 

three freeze/thaw cycles firom -7(fC (using a ethanol-dry ice bath) up to room temperature. The 
cells are lysed via sonication in short 10 sec bursts over 3 minutes at approximately 80W while 
kept on ice. The sonicated sample is then centrifuged at 15,0(X) RPM for 30 minutes at 4°C. The 
supernatant is passed through a column containing 1 .0 ml of CL-4B resin to pre-clear the sample 

25 of any proteins that may bind to agarose non-specifically, and the flow-through fraction is 
collected. 

The pre-cleared flow-through is applied to a nickel-nitrilo-tri-acetic acid ("Ni-NTA") 
affinity resin column (Quiagen, Inc., supra). Proteins with a 6 X His tag bind to the Ni-NTA 
' resin with high affinity and can be purified in a simple one-step procedure. Briefly, the 

30 supernatant is loaded onto the colunm in Lysis Buffer A at 4°C, the column is first washed with 
10 volumes of Lysis Buffer A until the A280 of the eluate returns to the baseline. Then, the 
colunm is washed with 5 volumes of 40 mM Imidazole (92% Lysis Buifer A / 8% Buffer B) 
(Buffer B = 50 mM Na-Phosphate, 300 mM NaCl. 10% Glycerol, 10 mM 2-mercaptoethanol, 
500 mM Imidazole, pH of the final buffer should be 7.5). The protein is eluted off of the colunm 

35 with a series of increasing Imidazole solutions made by adjusting the ratios of Lysis Buffer A to 
Buffer B. Three different concentrations are used: 3 volumes of 75 mM Imidazole, 3 volumes of 
150 mM Imidazole, 5 volumes of 500 mM Imidazole. The fractions containing the purified 
protein are analyzed using 8 %, 10 % or 14% SDS-PAGE depending on the protein size. The 
purified protein is then dialyzed 2X against phosphate-buffered saline (PBS) in order to place it 
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into an easily workable buffer. The purified protein is stored at 4°C or frozen at -80**. 

The following alternative method may be used to purify J. pallidum expressed in E coli 
when it is present in the form of inclusion bodies. Unless otherwise specified, all of the 

following steps are conducted at 4-10*'C. 

5 Upon completion of the production phase of the E, coli fermentation, the cell culture is 

cooled to 4-10*'C and the cells are harvested by continuous centrifiigation at 15,000 rpm 

(Heraeus Sepatech). On the basis of the expected yield of protein per unit weight of cell paste 
and the amount of purified protein required, an appropriate amount of cell paste, by weight, is 
suspended in a buffer solution containing 100 mM Tris, 50 mM EDTA. pH 7.4. The cells are 

10 dispersed to a homogeneous suspension using a high shear mixer. 

The cells are then lysed by passing the solution through a microfluidizer (Microfuidics, 
Corp. or APV Gaulin, Inc.) twice at 4000-6000 psi. The homogenate is then mixed with NaCl 
solution to a fmal concentration of 0.5 M NaCl, followed by centrifiigation at 7000 x g for 15 
min. The resultant pellet is washed again using 0.5M NaCl, 100 mM Tris, 50 mM EDTA, pH 

15 7.4. 

The resulting washed inclusion bodies are solubilized with 1 .5 M guanidine 
hydrochloride (GuHCl) for 2-4 hours. After 7000 x g centrifiigation for 15 min., the pellet is 

discarded and the 71 pallidum polypeptide-containing supernatant is incubated at 4°C overnight to 

allow further GuHCl extraction. 
20 Following high speed centrifugation (30,000 x g) to remove insoluble particles, the 

GuHCl solubilized protein is refolded by quickly mixing the GuHCl extract with 20 volumes of 
buffer containing 50 mM sodium, pH 4.5, 150 mM NaCl, 2 mM EDTA by vigorous stirring. 

The refolded diluted protein solution is kept at 4^C without mixing for 12 hours prior to further 

purification steps. 

25 To clarify the refolded T. pallidum polypeptide solution, a previously prepared tangential 

filtration unit equipped with 0. 16 ^m membrane filter with appropriate surface area (e.g., 

Filtron), equilibrated with 40 mM sodium acetate, pH 6.0 is employed. The filtered sample is 
loaded onto a cation exchange resin (e.g., Poros HS-50, Perseptive Biosystems). The column is 
washed with 40 mM sodium acetate, pH 6.0 and eluted with 250 mM, 500 mM, ICXK) mM, and 

30 1500 mM NaCl in the same buffer, in a stepwise manner. The absorfoance at 280 mm of the 

effluent is continuously monitored. Fractions are collected and further analyzed by SDS-F AGE. 

Fractions containing the T, pallidum polypeptide are then pooled and mixed with 4 
volumes of water. The diluted sample is then loaded onto a previously prepared set of tandem 
columns of strong anion (Poros HQ-50, Perseptive Biosystems) and weak anion (Poros CM-20, 

35 Perseptive Biosystems) exchange resins. The columns are equilibrated with 40 mM sodium 

acetate, pH 6.0. Both columns are washed with 40 mM sodium acetate, pH 6.0, 200 mM NaCl. 
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The CM-20 column is then eluted using a 10 column volume linear gradient ranging from 0.2 M 
NaCl, 50 mM sodium acetate, pH 6.0 to 1.0 M NaQ, 50 mM sodium acetate. pH 6.5. Fractions 
are collected under constant A^so monitoring of the effluent. Fractions containing the T. pallidum 
polypeptide (determined, for instance, by 16% SDS-PAGE) are then pooled. 
5 The resultant T. pallidum polypeptide exhibits greater than 95% purity after the above 

refolding and purification steps. No major contaminant bands are observed from Conunassie 
blue stained 16% SDS-PAGE gel when 5 \ig of purified protein is loaded. The purified protein 
is also tested for endotoxin/LPS contamination, and typically the LPS content is less than 0.1 
ng/ml according to LAL assays. 

10 

6(b). Alternative Expression and Purification Borrelia polypeptides in E. 

coli 

Tthe vector pQElO is alternatively used to clone and express some of the polypeptides of 
the present invention for use in the soft tissue and systemic infection models discussed below. 

15 The difference being such that an inserted DNA fragment encoding a polypeptide expresses that 
polypeptide with the six His residues (i.e., a "6 X His tag") covalently linked to the amino 
terminus of that polypeptide. The bacterial expression vector pQE 1 0 (QIAGEN, Inc., 9259 Eton 
Avenue, Chatsworlh, CA, 91311) was used in this example . The components of the pQElO 
plasmid are arranged such that the inserted DNA sequence encoding a polypeptide of the present 

20 invention expresses the polypeptide with the six His residues (i.e., a "6 X His tag")) covalently 
linked to the amino terminus. 

The DNA sequences encoding the desired portions of a polypeptide of SEQ ID NOS: 1- 
744 were amplified using PGR oligonucleotide primers from genomic T, pallidum DNA. The 
PGR primers anneal to the nucleotide sequences encoding the desired amino acid sequence of a 

25 polyp)eptide of the present invention. Additional nucleotides containing restriction sites to 

facilitate cloning in the pQElO vector were added to the 5' and 3* primer sequences, respectively. 

For cloning a polypeptide of the present invention, the 5' and 3' primers were selected to 
amplify their respective nucleotide coding sequences. One of ordinary skill in the art would 
appreciate that the point in the protein coding sequence where the 5' and 3' primers begins may 

30 be varied to amplify a DNA segment encoding any desired portion of a polypeptide of the present 
invention. The 5' primer was designed so the coding sequence of the 6 X His tag is aligned with 
the restriction site so as to maintain its reading frame with that of 7. pallidum polypeptide. The 
3' was designed to include an stop codon. The amplified DNA fragment was then cloned, and 
the protein expressed, as described above for the pQE60 plasmid. 

35 The DNA sequences encoding the amino acid sequences of SEQ ID NOS: 1-744 may also 

be cloned and expressed as fusion proteins by a protocol similar to that described directly above, 
wherein the pET-32b(+) vector (Novagen, 601 Science Drive, Madison, WI 5371 1) is 
preferentially used in place of pQElO. 
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The above methods are not limited to the polypeptide fragements actually produced. The 
above method, like the methods below, can be used to produce either full length polypeptides or 
desired fragements therof . 

6(c). Alternative Expression and Purification of Borrelia polypeptides in 
E. coll 

The bacterial expression vector pQE60 is used for bacterial expression in this example 
(QIAGEN, Inc., 9259 Eton Avenue, Chatsworth, CA, 91311). However, in this example, the 
polypeptide coding sequence is inserted such that translation of the six His codons is prevented 
and, therefore, the polypeptide is produced with no 6 X His tag. 

The DNA sequence encoding the desired portion of the T. pallidum amino acid sequence 
is amplified from an T. pallidum genomic DNA prep the deposited DNA clones using PGR 
oligonucleotide primers which anneal to the 5' and 3' nucleotide sequences corresponding to the 
desired portion of the J. pallidum polypeptides. Additional nucleotides containing restriction 
sites to facilitate cloning in the pQ£60 vector are added to the S' and 3' primer sequences. 

For cloning a T. pallidum polypeptides of the present invention, 5' and 3* primers are 
selected to amplify their respective nucleotide coding sequences. One of ordinary skill in the art 
would appreciate that the point in the protein coding sequence where the 5* and 3* primers begin 
may be varied to amplify a DNA segment encoding any desired portion of a polypeptide of the 
present invention. The 3* and 5' primers contain appropriate restriction sites followed by 
nucleotides complementary to the 5' and 3' ends of the coding sequence respectively. The 3' 
primer is additionally designed to include an in-frame stop codon. 

The amplified T, pallidum DNA fragments and the vector pQ£60 are digested with 
restriction enzymes recognizing the sites in the primers and the digested DNAs are then ligated 
together. Insertion of the 7. pallidum DNA into the restricted pQ£60 vector places the T. 
pallidum protein coding region including its associated stop codon downstream from the IPTG- 
inducible promoter and in-frame with an initiating AUG. The associated stop codon prevents 
translation of the six histidine codons downstream of the insertion point. 

The ligation mixture is transformed into competent E. coli cells using standard procedures 
such as those described by Sambrook et al. E. coli strain M 15/rep4, containing multiple copies 
of the plasmid pREP4, which expresses the lac repressor and confers kanamycin resistance 
("Kanr"), is used in carrying out the illustrative example described herein. This strain, which is 
only one of many that are suitable for expressing T. pallidum polypeptide, is available 
commercially (QIAGEN, Inc., supra). Transformants are identified by their ability to grow on 
LB plates in the presence of ampicillin and kanamycin. Plasmid DNA is isolated from resistant 
colonies and tl» identity of the cloned DNA confirmed by restriction analysis, PGR and DNA 
sequencing. 

Clones containing the desired constructs are grown overnight ("O/N") in liquid culture in 
LB media supplemented with both ampicillin (100 yug/nA) and kanamycin (25 ^g/ml). The O/N 
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culture is used to inoculate a large culture* at a dilution of approximately 1:25 to 1:250. The cells 
are grown to an optical density at 600 nm ("OD600") of between 0.4 and 0.6. isopropyl-b-D- 
thiogalactopyranoside ("IPTG") is then added to a final concentration of 1 mM to induce 
transcription from the lac repressor sensitive promoter, by inactivating the lad repressor. Cells 
5 subsequently are incubated further for 3 to 4 hours. Cells then are harvested by centnfugation. 

To purify the 7. pallidum polypeptide, the cells are then stirred for 3-4 hours at 4**C in 

6M guanidine-HCl, pH 8. The cell debhs is removed by centnfugation, and the supernatant 
containing the T. pallidum polypeptide is dialyzed against 50 mM Na*acetate buffer pH 6, 
supplemented with 200 mM NaCl. Alternatively, the protein can be successfully refolded by 

10 dialyzing it against 500 mM NaCl, 20% glycerol, 25 mM Tris/HCl pH 7.4, containing protease 
inhibitors. After renaturation the protein can be purified by ion exchange, hydrophobic 
interaction and size exclusion chromatography. Alternatively, an affinity chromatography step 
such as an antibody column can be used to obtain pure T. pallidum polypeptide. The purified 
protein is stored at 4**C or frozen at -80° C. 

15 The following alternative method may be used to purify T. pallidum polypeptides 

expressed in E coli when it is present in the form of inclusion bodies. Unless otherwise 

specified, all of the following steps are conducted at 4-10'*C, 

Upon completion of the production phase of the E. coli fermentation, the cell culture is 

cooled to 4-10°C and the cells are harvested by continuous centnfugation at 15,000 ipm 

20 (Heraeus Sepatech). On the basis of the expected yield of protein per unit weight of cell paste 
and the amount of purified protein required, an appropriate amount of cell paste, by weight, is 
suspended in a buffer solution containing 100 mM Tris, 50 mM EDTA, pH 7.4. The cells are 
dispersed to a homogeneous suspension using a high shear mixer. 

The cells ware then lysed by passing the solution through a microfluidizer (Microfiiidics, 

25 Corp. or APV Gaulin, Inc.) twice at 4000-6000 psi. The homogenate is then mixed with NaCI 
solution to a final concentration of 0.5 M NaCl, followed by centnfugation at 7000 x g for 15 
min. The resultant pellet is washed again using 0.5M NaCl, 100 mM Tris, 50 mM EDTA, pH 
7.4. 

The resulting washed inclusion bodies are solubilized with 1.5 M guanidine 
30 hydrochloride (GuHCl) for 2-4 hours. After 7000 x g centrifiigation for 15 min., the pellet is 

discarded and the T, pallidum polypeptide-containing supernatant is incubated at 4°C overnight to 
allow further GuHCl extraction. 

Following high speed centrifugation (30,000 x g) to remove insoluble particles, the 
GuHCl solubilized protein is refolded by quickly mixing the GuHCl extract with 20 volumes of 
35 buffer containing 50 mM sodium, pH 4.5, 150 mM NaCl, 2 mM EDTA by vigorous stirring. 

The refolded diluted protein solution is kept at 4*'C without mixing for 12 hours prior to further 



r 
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purification steps. 

To clarify the refolded T. pallidum polypeptide solution, a previously prepared tangential 
filtration unit equipped with 0. 16 \im membrane filter with appropriate surface area (e.g., 
Filtron), equilibrated with 40 mM sodium acetate. pH 6.0 is employed. The filtered sample is 
5 loaded onto a cation exchange resin (e.g., Poros HS-50. Perseptive Biosystems). The column is 
washed with 40 mM sodium acetate, pH 6.0 and eluted with 250 mM, 500 mM, 1000 mM, and 
1500 mM NaCl in the same buffer, in a stepwise manner. The absorbance at 280 mm of the 
effluent is continuously monitored. Fractions are collected and further analyzed by SDS-PAGE. 
Fractions containing the T, pallidum polypeptide are then pooled and mixed with 4 
10 volumes of water. The diluted sample is then loaded onto a previously prepared set of tandem 
columns of strong anion (Poros HQ-50, Perseptive Biosystems) and weak anion (Poros CM-20, 
Perseptive Biosystems) exchange resins. The columns are equilibrated with 40 mM sodium 
acetate, pH 6.0. Both columns are washed with 40 mM sodium acetate, pH 6.0, 200 mM NaCl. 
The CM-20 column is then eluted using a 10 column volume linear gradient ranging from 0.2 M 
15 NaCl, 50 mM sodium acetate, pH 6.0 to 1.0 M NaCl. 50 mM sodium acetate, pH 6.5, Fractions 
are collected under constant Ajgo monitoring of the effluent. Fractions containing the T. pallidum 
polypeptide (determined, for instance, by 16% SDS-PAGE) are then pooled. 

The resultant T. pallidum polypeptide exhibits greater than 95% purity after the above 
refolding and purification steps. No major contaminant bands are observed from Commassie 
20 blue stained 16% SDS-PAGE gel when 5 iig of purified protein is loaded. The purified protein 

is also tested for endotoxin/LPS contamination, and typically the LPS content is less than 0.1 
ng/ml according to LAL assays. 

6(d). Cloning and Expression of T. pallidum in Other Bacteria 

25 T. pallidum polypeptides can also be produced in: T. pallidum using the methods of S. 

Skinner et al.. (1988) MoL Microbiol. 2:289-297 or J. I. Moreno (1996) Protein Expr. Purif. 
8(3):332-340; Lactobacillus using the methods of C Rush et al., 1997 Appl. Microbiol. 
Biotechnol. 47(5):537-542; or in Bacillus subtilis using the methods Chang et al., U.S. Patent 
No. 4,952,508. 

30 

7. Cloning and Expression in COS Cells 

A T. pallidum expression plasmid is made by cloning a portion of the DNA encoding a T. 
pallidum polypeptide into the expression vector pDNAl/Amp or pDNAUI (which can be 
obtained from Invitrogen, Inc.). The expression vector pDNAI/amp contains: (1) an £1 coli 
35 origin of replication effective for propagation in E, coli and other prokaryotic cells; (2) an 
ampicillin resistance gene for selection of plasmid-containing prokaryotic cells; (3) an S V40 
origin of replication for propagation in eukaryotic cells; (4) a CMV promoter, a polylinker, an 
SV40 intron; (5) several codons encoding a hemagglutinin fragment (i.e., an "HA" tag to 
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facilitate purification) followed by a temiination codon and polyadenylation signal arranged so 
that a DN A can be conveniently placed under expression control of the CMV promoter and 
operably linked to the SV40 intron and the polyadenylation signal by means of restriction sites in 
the polylinker. The HA tag corresponds to an epitope derived from the influenza hemagglutinin 
5 protein described by Wilson et al. 1984 CeU 37:767. The fusion of the HA tag to the target 
protein allows easy detection and recovery of the recombinant protein with an antibody that 
recognizes the HA epitope. pDNAHI contains, in addition, the selectable neomycin marker. 

A DNA fragment encoding a T. pallidum polypeptide is cloned into the polylinker region 
of the vector so that recombinant protein expression is directed by the CMV promoter. The 

10 plasmid construction strategy is as follows. The DNA from a T. pallidum genomic DNA prep is 
amplified using primers that contain convenient restriction sites, much as described above for 
construction of vectors for expression of T. pallidum in E. coli. The 5' primer contains a Kozak 
sequence, an AUG start codon, and nucleotides of the 5* coding region of the T. pallidum 
polypeptide. The 3' primer, contains nucleotides complementary to the 3' coding sequence of the 

15 T. pallidum DNA, a stop codon, and a convenient restriction site. 

The PGR amplified DNA fragment and the vector, pDN Al/Amp, are digested with 
appropriate restriction enzymes and then ligated. The ligation mixture is transformed into an 

appropriate E, coli strain such as SURE™ (Stratagene Cloning Systems, La Jolla, CA 92037), 

and the transformed culture is plated on ampicillin media plates which then are incubated to allow 
20 growth of ampicillin resistant colonies. Plasmid DNA is isolated from resistant colonies and 

examined by restriction analysis or other means for the presence of the fragment encoding the 7. 
pallidum polypeptide 

For expression of a recombinant T. pallidum polypeptide, COS cells are transfected with 
an expression vector, as described above, using DEAE-dextran, as described, for instance, by 
25 Sambrook et al. {supra). Cells are incubated under conditions for expression of 71 pallidum by 
the vector. 

Expression of the T, pallidum-HA fusion protein is detected by radiolabeling and 
immunoprecipitation, using methods described in, for example Harlow et al., supra.. To this 
end, two days after transfection, the cells are labeled by incubation in media containing ^S- 

30 cysteine for 8 hours. The cells and the media are collected, and the cells are washed and the 
lysed wiUi detergent-containing RIPA buffer: 150 mM NaCl, 1% NP-40, 0.1% SDS. 1% NP- 
40, 0.5% DOC, 50 mM TRIS, pH 7.5. as described by Wilson et al. (supra ). Pix)teins are 
precipitated from the cell lysate and from the culture media using an HA-specific monoclonal 
antibody. The precipitated proteins then are analyzed by SDS-PAGE and autoradiography. An 

35 expression product of the expected size is seen in the cell lysate, which is not seen in negative 
controls. 

8. Cloning and Expression in CHO Cells 
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The vector pC4 is used for the expression of T. pallidum polypeptide in this example. 
Plasmid pC4 is a derivative of the plasmid pSV2-dhfr (ATCC Accession No. 37146). The 
plasmid contains the mouse DHFR gene under control of the SV40 early promoter. Chinese 
hamster ovary cells or other cells lacking dihydrofolate activity that are transfected with these 
5 plasmids can be selected by growing the ceils in a selective medium (alpha minus MEM, Life 

Technologies) supplemented with the chemotherapeutic agent methotrexate. The amplification of 
the DHFR genes in cells resistant to methotrexate (MTX) has been well documented. See, e.g., 
Alt et al., 1978, J. BioL Chem. 253:1357-1370; Hamlin et al., 1990, Biochem. et Biophys. 
Acta, 1097:107-143; Page et al., 1991, Biotechnology 9:64-68. Cells grown in increasing 

10 concentrations of MTX develop resistance to the drug by overproducing the target enzyme, 

DHFR, as a result of amplification of the DHFR gene. If a second gene is linked to the DHFR 
gene, it is usually co-amplified and over-expressed. It is known in the art that this approach may 
be used to develop cell lines carrying more than 1 ,000 copies of the amplified gene(s). 
Subsequentiy, when the methotrexate is withdrawn, cell lines are obtained which contain the 

15 amplified gene integrated into one or more chromosome(s) of the host cell. 

Plasmid pC4 contains the strong prompter of the long terminal repeat (LTR) of the Rouse 
Sarcoma Virus, for expressing a polypeptide of interest, Cullen, et al. (1985) Mol. Cell. Biol. 
5:438-447; plus a fragment isolated from the enhancer of the immediate early gene of human 
cytomegalovirus (CMV), Boshait, et al., 1985, Cell 41:521-530. Downstream of the promoter 

20 are the following single restriction enzyme cleavage sites that allow the integration of the genes: 
Bam HI, Xba I, and Asp 718. Behind these cloning sites the plasmid contains the 3' intron and 
polyadenylation site of the rat preproinsulin gene. Other high efficiency promoters can also be 
used for the expression, e.g., the human 6-actin promoter, the SV40 early or late promoters or 
the long terminal repeats from other retroviruses, e.g., HIV and HTLVI. Clontech's Tet-Off and 

25 Tet-On gene expression systems and similar systems can be used to express the 7*. pallidum 

polypeptide in a regulated way in mammalian cells (Gossen et al., 1992, Proc. Natl. Acad. Sci. 
USA 89:5547-5551. For the polyadenylation of the mRNA other signals, e.g., from the human 
growth hormone or globin genes can be used as weU. Stable cell lines carrying a gene of interest 
integrated into the chromosomes can also be selected upon co-transfection with a selectable 

30 marker such as gpt, G418 or hygromycin. It is advantageous to use more than one selectable 
marker in the beginning, e.g., G4 1 8 plus methotrexate. 

The plasmid pC4 is digested with the restriction enzymes and then dephosphorylated 
using calf intestinal phosphates by procedures known in the art. The vector is then isolated fi-om 
a 1% agarose gel. The DNA sequence encoding the T, pallidum polypeptide is amplified using 

35 PCR oligonucleotide primers corresponding to the 5' and 3' sequences of the desired portion of 
the gene. A 5* primer containing a restriction site, a Kozak sequence, an AUG start codon, and 
nucleotides of the 5* coding region of the T. pallidum polypeptide is synthesized and used. A 3' 
primer, containing a restriction site, stop codon, and nucleotides complementary to the 3' coding 
sequence of the J. pallidum polypeptides is synthesized and used. The amplified fragment is 
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digested with the restriction endohucleases and then purified again on a 1% agarose gel. The 
isolated fragment and the dephosphorylated vector are then ligated with T4 DNA ligase. E, coli 
HBlOl or XL-1 Blue cells are then transformed and bacteria are identified that contain the 
fragment inserted into plasmid pC4 using, for instance, restriction enzyme analysis. 

Chinese hamster ovary cells lacking an active DHFR gene are used for transfection. Five 
^g of the expression plasmid pC4 is cotransfected with 0.5 ^g of the plasmid pSVneo using a 
lipid-mediated transfection agent such as Lipofectin™ or LipofectAMINE.™ (LifeTechnologies 
Gaithersburg, MD). The plasmid pS V2-neb contains a dominant selectable maiker, the neo gene 
from Tn5 encoding an enzyme that confers resistance to a group of antibiotics including G418. 
The cells are seeded in alpha minus MEM supplemented with 1 mg/ml G418. After 2 days, the 
cells are trypsinized and seeded in hybridoma cloning plates (Greiner, Germany) in alpha minus 
MEM supplemented with 10, 25, or 50 ng/ml of methotrexate plus 1 mg/ml G418. After about 
10-14 days single clones are trypsinized and then seeded in 6-well petri dishes or 10 ml flasks 
using different concentrations of methotrexate (50 nM, 100 nM, 200 nM, 400 nM, 800 nM). 
Clones growing at the highest concentrations of methotrexate are then transferred to new 6-well 
plates containing even higher concentrations of methotrexate (1 jiM, 2 |iM, 5 fiM, 10 mM, 20 
mM). The same procedure is repeated until clones are obtained which grow at a concentration of 
100-200 pM. Expression of the desired gene product is analyzed, for instance, by SDS-PAGE 
and Western blot or by reversed phase HPLC analysis. 

The disclosure of all publications (including patents, patent applications, journal articles, 
laboratory manuals, books, or other documents) cited herein are hereby incoiporated by reference 
in their entireties SEQ ID NOS: 1-744 are hereby incorporated into the specification by reference. 

The present invention is not to be limited in scope by the specific embodiments described 
herein, which are intended as single illustrations of individual aspects of the invention. 
Functionally equivalent methods and components are within the scope of the invention, in 
addition to those shown and described herein and will become apparant to those skilled in the art 
from the foregoing description and accompanying drawings. Such modifications are intended to 
fall within the scope of the appended claims. 
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TABLE 3. 

Treponema pallidum - Putative coding regions of novel proteins not similar to know proteins 
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TABLE 3. 

Treponema pallidum - Putative coding regions of novel proteins not similar to know proteins 
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Treponema pallidum - Putative coding regions of novel proteins not similar to know proteins 
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TABLE 3. 

Treponema pallidum - Putative coding regions of novel proteins not similar to know proteins 
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30 


18212 
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921 
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8 
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9 
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14 
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3 
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2 


698 
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3 
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3 
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3 
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4 
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13 
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2 
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1 


346 


5 


116 


2 
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8 
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11 
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12 
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116 


14 


5140 
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6170 
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8 
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10 
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6493 
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2 
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1314 
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9 


3632 
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10 


4524 
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11 
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12 


6015 


6359 
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13 
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7308 
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15 


7894 
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17 


8347 
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121 


20 
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21 
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28 
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29 
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30 
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31 
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2 
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4 


2300 
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1 
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3 
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4 
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1 
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135 


2 
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5 
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9 
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4613 
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10 
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12 
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3 
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3 
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2 
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7 
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9 
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7 
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142 


8 
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9 
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2102 
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7 
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8 
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3 


983 
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2 
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51 
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4 


890 
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5 
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16 
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TABLE 3. 

Treponema pallidum - Putative coding regions of novel proteins not similar to know proteins 
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139 
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43 
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1 
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5 
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3 
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1 
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2 
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1 
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3 
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2 


369 


88 
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2 
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2 


609 
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5 


1978 


1466 
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6 


2076 


1792 


172 


7 


2825 


2019 
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Ms 
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2 
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3 
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18 


413 
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85 


507 


199 
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376 


203 




321 


611 
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2 
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209 




1 
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210 




229 


2 


212 




42 


584 


212 




383 


808 


224 




38 


286 


224 




579 


325 
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201 


587 


328 




360 


4 


376 




567 


139 


389 




485 


3 


423 




545 


270 


478 




277 


11 


480 




27 


305 


482 




327 


79 


484 




310 
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TABLE 3, 

Treponema pallidum - Putative coding regions of novel proteins not similar to know proteins 
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75 
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3 


551 
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547 
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455 


559 




537 


55 


565 




82 


420 


566 
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360 


929 


566 


3 


769 


1104 


579 


2 


379 


2 


605 




334 


53 


608 




186 


4 


620 




444 


115 


625 




281 


3 


626 




253 


41 


626 




847 


578 


628 




555 


79 


628 




626 


306 


633 




195 


4 


634 




35 


583 


636 




3 


308 


643 




1 


402 


644 




1 


339 


644 




525 


4 


645 




747 


427 


646 




79 


453 


648 




426 


4 


649 




264 


536 


659 




90 


359 


668 




103 


342 


668 




288 


536 


669 




251 


39 


678 




382 


95 


679 




513 


130 


682 




108 


434 


684 




438 


133 


687 




2 


262 


691 




337 


14 


702 




549 


121 


703 




2 


307 


719 




531 


358 


742 




408 


220 
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(1) GENERAL INFORMATION: 

(i) APPLICANT: Human Genome Sciences Inc., et. al. 

(ii) TITLE OF INVENTION: Treponema pallidum Polynucleotides andi 

Sequences 

(iii) NUMBER OF SEQUENCES: 744 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Human Genome Sciences, Inc. 

(B) STREET: 9410 Key West Avenue 

(C) CITY: Rockville 

(D) STATE: Maryland 

(E) COUNTRY: USA 

(F) ZIP: 20850 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette, 3.50 inch, 1.4Mb storage 

(B) COMPUTER: HP Vectra 486/33 

(C) OPERATING SYSTEM: MSDOS version 6.2 

(D) SOFTWARE: ASCII Text 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: Unassigned 

(B) FILING DATE: Jime 23, 1998 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) TIPPLICATION NUMBER: 60/050,667 
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(B) FILING DATE: June 24, 1997 
(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Brookes, A. Anders 

(B) REGISTRATION NUMBER: 36,373 

(C) REFERENCE /DOCKET NUMBER: PB387PCT 

(vi) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (301) 309-8504 

(B) TELEFAX: (301) 309-8512 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14063 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

AAGGTTGTTT GTTGATAATC TCTGCCATAT TACTGTTCCC TTCTTTTCGT TCATCGGGTA 60 

AGAGCCGTCA GCGGTGAGCG CGCCACcTCC TTCACTAATC ACcACGTCGT GAACAGGCGC 120 

TGCcGTAcAG CCGACCACCA ACGGCTCTTC TACCCACTCA ACTCTACATT CCACGCTAGC 180 

GAGCCAACGC AGCAACGATC GTATCGATAT TGTGTGTTAC CATCCCTACG TAGGTACCCT 240 

CGCTCGTACC CGCATCCCCC ATCGCATCAG AAAACAACTC GCCTCCAATn CTGnTACGTG 300 

CCCTCTTGCC TGCACCGCAT CCCTTAACGC TTCAACGTTT TTGTGCGGAA TAGAACTCTC 360 

AATAAAGATA GCAGGGAGTT irtTACnTGCGC AATAAACGCT GCCAGTTCCT GCATATCATG 420 

CGCACTGGCT TCCGAAGCGG TGCTCACCCC TTGCAACCCC TTCACCTCAA AACCATACGC 480 

ACGGCTAAAA TAGCCGAACG CATCATGAGC GGTCACCAAC ACACGCcTTT CAGCAGGCAG 540 

CGACTGCGCC TTGCGCCGAA CGTACGCGTC AAGCTTATCC AACTGCTGCT GGTACGCCTG 600 

ATAACGTTGA GTAAATTCGC GAGTTTTTCC CGGCAACAGC TTGCACAAGC TTTCGTACAC 660 

TGCCTTCACC GAATAAGACC ACAGCTTTAC ATCAAACCAC ACATGCGGAT CGAACTCTGC 720 

TTCCTCAAGA GAAAGACGCT GAGACACCGG AATAGTCTCA GAAACTGCAA CTACCAAGCG 780 
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GCTCCCGCGC AGTTTGGAAA 
CAGGATGAGA TCCGCATTCC 
CGGGTCAACA CCAGGACCCA 
GACAGCATCC GCTATCATGC 
ATCCTTGCTA CCGAATGCGT 
ACATATTCTT TCACGTATCA 
GATCACGCAG GAAACGTCAA 
CGCGGACCTA CTCGAGCTAC 
CAGTTGCGCC CTCTTATGGA 
CCTACAGACG ATCGCGAGCG 
GAGAAAACAA AACGCAACTT 
TTAGATCACC TGTTCCTCAT 
ACTGCATGCG AGCTGGGCGT 
AAAGATTTAG CGGGTATCCT 
TCCACCGCCT CAAACCAGCC 
TGGATTGGGT TATCGGTCAG 
TTACCCTCAT TGGTGCAACC 
TTGGAATCGT AGAGCGCTTC 
GCTCAGCGCG GCTTCTAGAT 
TTC6CGAGGA ACACCCCGGG 
AGTTGCGGGG TCTGCACACA 
ATCXSACGAAT TAGGGCTAGA 
TTCX3GCGGAG GGCCAGTGGG 
ACACTTGAGG ATTACTACGA 
CGCGGGCGCA TGGCCACCGC 
CGCACGCTCA CCCCGCACTC 
CTTGTCAGAG CTGTCGGCAG 
ATATACTACG CGCX3GGAGGG 
GCGGTCTGCA CCTGCTGATG 




179 

ACACCTCGCC CATCTTGGTT 
CGAGCCATTC CACATCCCCC 
TCAACCCCTT TAGATGCACA 
CAATGGTGGT GACAACCAGG 
GCGTAAAACC GGTCAGCATG 
AGTGACACTC CTTGGGTGAA 
TTTTTCAGTA CCcTTTCAGG 
ATGATAAAGA AGCATTTGAT 
TCACACCGCG TCACTGAGTC 
TGCGCTCAGA CCGCGCCTCC 
ACGTCTTTTC ATTCAGGCAG 
CGGCCXXrCCG GGGCTcGGCA 
TGAGTGCAAG GTTACTIGGCG 
CACTGCGCTG AGTGAGCGAA 
ATAGAAGAGA TGCTGTACAT 
GGACCGTCCG CGCGCACX3GT 
ACTCGCGCGG GTATGGTTTC 
GAGTTCTATA CCCCTGAGGA 
ATCACGCTCG ACGCACGCGC 
TGGCCAACCG GCTTTTGCGC 
TCAGCGAGAC GATAGTACGC 
ACTGCACGAC ATACAGCTGC 
CGCAGAAACG CTGGCGATCT 
GCCCTACCTT ATCCAAATTG 
GCGTGCCTAT GCGCACCTAG 
CCCAGAACAA GGAACGCTTC 
tCCGCTCAGA CCGGGTAAGA 
GTGCGTTTCG TGAAGAT6TC 
CAACCATCGC AAGCCACCAG 
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TCCAGGTGCA ACCCGTTGTA 840 

GCAGTAGCCG TGTACAGGTG 900 

TCACCTTGAG CGATGTTTTT 960 

GGTTTCCCGT CCGCTGCX3GC 1020 

CCAAGCGCGA GCACX5CAGGC 1080 

TTTgATGCAT CAAAGTAGCG 1140 

AAAAGAAAAC GGCACTCTGG 1200 

CTTCCTCGCA TAAGCAGCTA 12 60 

CTGTGCGCCC TGAAGCACAA 1320 

TGAAAGACTT TCTAGGTCAG 1380 

CGCGCGATCG CAACGAAAGC 1440 

AAACGACGCT CGCX3CATATC 1500 

CACCGGCGCT TGATAAACCA 1560 

GGCllTTCTTC GTGGATGAAA 1620 

TGCCATGGAG GACTACGAAC 1680 

GCGCATCCCA CTCCCCCCGT 1740 

AAGCCCGCTG ATTAGCCGCT 1800 

GCTTGCTGCC ATTGTGCAAC 1860 

AGTTnAGCCC TTGCXXTGGTG 1920 

CX5TATACGCG ATTTTGCCCA 1980 

GCAGGcTTGC CCACCTAAAG 2040 

TGCGCGTCAT GaTTGAGCAC 2100 

CCCTCGGGGA ATCACCGGAA 2160 

GGCTCATGCA GCGCACXTCCC 2220 

GTCTCCCTGT CCCCGAGGCA 2280 

TTTAGCAAAG ATGCGGACAC 2340 

CACAAGAGTC TGAAAAAGGC 2400 

TGCGTTTTTT GCACCAACXTt 2460 

CTGCTCATGC GCGCAGGGTA 2520 
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CGTCAGAAAA ATCGCCAACG GCCTGTTTGC GTACCTTCCC CTGGGCcTGC GCGTTCGACA 
CAAAATTGAA GCGATTATTC GGGAAGAACT CGAGGCTATC GGGTGTTTGG AGTGCACCGC 
GCCTGTCGTG ACTCCTGCAG AGTTGTGGAA GGAATCTGGC CGCTGGTACC GCATGGGCGC 
AGAGCTTTTG CGCGCCAAAA ATCGGCTCGA TCACGAGCTC CTTTTCAGTC CGACTGCAGA 
AGAATCCTTC ACCGCTTTGG TGCGCXSGCGA CTGTACTTCC TACAAACATT TTCCCCTCAG 
TCTCTACCAA ATCAACGCAA AATATCGCGA TGAAATCCGT CCGCGTTACG GACTGATGCG 
CGCGCGCGAG TTCACCATGG CCGACGCCTA TTCTTTCCAC ACAGACTGCG CATGCCTTGC 
GCGCACGTAC GAAAAGTTTG CX3CACGCGTA TCGCGCCATT TTCCGTCGCA TCGGCCTATC 
AGTCATTGCA GTACATGCAC ACcTCGGTGC GATGGGGGGG CAGGAATCCG AGGAATTCAT 
GGTAGAGTCC GCGGTGGGCG ACAACACGCT CCTGTTGTGT CCCCACTGcA CCTACGCTGg 
CAAATTGCGA AAAGGCCGTC GGACAGCGCC CCCTCCCAGA CACX3CATGAC ACTCATCTAA 
AAGACGAACA CGAAGGgTCA GATCTCAAGA CGCCTGCAGC AATGCGCGAG GTGCACACCC 
CGCACGTGAA AACTATTGAG GAACTTGAAC ACTTCTTGCA C6TACCTGCA CATCGCTGCA 
TCAAGACGCT TATTTACCX3C ATTGACACGG TGCCCCAGGC GGCTGGGCAT TTTGTGGCAG 
TGTGCATCCG CGGCGACCTA GAACTCAACG AGTCAAAGCT CGAAGCGCTC CTGCGCGTGC 
CATCTGTAGT ACTGGCAACT GAACAAGAGG TGTATGCACT CAGCGGCACC CCCGTAGGAT 
TCATTGGTCC GGTAGGAcTT GCACAGCGTG CTGCAGCTGC GTATGCCGCT CGCACCCtGC 
GTTCTTCCCC TCCGCTGCTG AGCCTGCATC CX3TCACTTCT GACATTCCAT T TTTTT CCCT 
CGTTGCAGAT CAGTCCGTGA TGGCTATGCA CAACGCTATC ACCGGTGCGT TGAAAGTTGA 
CACGCATCTT GTGCAGGTAG AACCGGGTCG AGACTTTGTT CCTGACGCAg TTGCAGATCT 
CATGCTCGTG CGCGCCGGCG ACCGGTGCAT ACACTGTGGA GCGCCCCTAT ACGAAAAAAA 
GGGTAACGAA CTAGGTCACC TCTTTAAATT AQGGGACAAA TACACGCX^CA gcATGcACCT 
TACCTTTACT GATGAGCAGG GTGTACGACA GTTCCCCCTG ATGGGCTGCT ATGGCATTGG 
CCTTGATCGC ACGCTTGCCT CTGTGGTGGA AAACCACCAT GACACGCGGG GTATCAGCTG 
GCCGCTTGCG ATCAGCCCCT ATGCAGTTGT GCTCATACCC ATCCCTCACA CGCAGGCCCC 
CTATGCAGCA GCAGAGGCAC TGTACGTGCA GCTGCGGACA CGGGGAGTTG AGGTACTGTT 
TGATGATCGT GCAGAGCGAC CCGGAGTAAA GTTCGCAGAC GCTGATTTAA TCGGTATTCC 
CTTCGTGTGG TACTGAGTGC GAAAAnCTAC CGCGCGTTGA ATGCaCAACA CGGTGTGGTG 
CGCACACGTA TTTTTTTACG CAAGAAGAGG CGTCCGAGCA CATTGCACGC COXSCTCGAAC 



2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
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AACTCGCTTC CCCGGAAAGT TCGTAAGAAC GGGAATGCCG GAGCGGGATC CAGCGCATGC 4320 

AGTGCTGAGA CCTGCX3CATA ATAGCACAGT GTACGGCACC CGTGGTTTAG AAAAAAATGA 4380 

CX3AAGGAGAA AAGGAAAAAC GGTGTACATA AAGGTAGCGC TCGTGTGTCT TTTCAGCATG 4440 

GGAGCGCGGT GTCTTTTGGC CACAGAACCG GCGCCAGTCT CTGGAGATTA CGTATTGTAT 4500 

CGCGACTATT CGTGGAAATC GCCCACATGG GTTGGCTTTT TGTGCTACGA CGCACACACX3 4560 

TACGGTGCGC TGCTGTGTAC TCCGGCAGAA AGCCGCAGGA TCACAATTCT CTTCACGGGT 4620 

ACTGAAAAGC ACGGCCGCTT TGAGCTGACC GGACAACGCA TCACCTCACC GGTGCGCACA 4680 

GAGGATCTGA CTGGCATAAA TTATCTCATG GATCTTTTTC CTCAACTACA GCGCTGGAAG 4740 

CATTTTCCCC GGGATACACA CACCCTTGTT GCGCGGCATA CCGATCGGAG TAAAAAGAGC 4800 

ACACAATTCT CAGGGGCAGT CGAACTGCAG TTCGCTTCTT TTGTCCCCCT CTTCCACCTA 4860 

GAAATACTCC GTGATAAGCA GCAGCGCGTC ATGCTCCAGC TAAGCGAGAT AGGGAAGATC 4920 

GACCACACCA GTGACGCAGC CTTCTTTCAA TTCACCCCCA TGCCCCCGTC CACGCCCACT 4980 

GATGCACCGC CAGCAACX3CT TAATCAGACC CT6ACACGCA CGGAGTATGT CATCGATGAC 5040 

GTGTGCATTG CACTTGATCC GCAGTGGAAA AGAATTGCAG AAAATTCCTT TCTTTCAGAC 5100 

TTTGCCTTTC TCACCX3TACA CCAGGTGCCT GCACCGCGCG CGCACGACTA TTCTGCGCTC 5160 

CGTGCATTGC TGCAACTCTT TCTGTATTCA GGCCCTCAGG GAAAAAACAT TCTTGAACAA 5220 

CTCCATATCA ATGACACTCA CGCGCGTCTT ACGCTTTCCT ATGCAGTGTT TGACCTTCCG 5280 

TCAAAAACAG TTAAAAAGAC ATGGAAGATA TTCATCCGCC ACTCTGATAC GCACTACTCT 5340 

ATACTTAGTC TCACGGCGGA CCAgCGCACA GCGCAGsGTT ACXKX3CGCTA CTTTGACACG 5400 

CTCATTGAAA CTATCCGTAC AAAAAACTAA AAAATGCTGA ATTGGAGCAT ACCCGTGATT 5460 

AGACACATAT TATTTGACAT AGACAACACG CTGTACTCCT GTACAAATCC CATTGAAATG 5520 

GCTATCACGC AGCGCATACA CACATTTGTT GCACATTTTC TCCACX3TATC TTGTGAGGAG 5580 

GCGCGCGCGT TACGCCAGCG CACAAAGCAC CTCTATGCTA CCACCTTTGA GTGGTTAAAG 5640 

GCAGAGCACA ATCTCATTCA CGATGAACAC TACTTTCGTG CCGTATATCC TCCCACC6AA 5700 

ATACAGGAGT TGCAGTACGA TCCGATGcTC CGCCCTTTTT TACAGTCACT GCACATGCCA 5760 

CTGACGGCAT TAACTAACGC ACCGCGCGTG CACGCACAAC GCGTATTGGA TTTTTTTCAT 5820 

CTGTCAGACC TTTTTTTAGA TGTCTTTGAC ATCACGTATC ATGCAGGCAA GGGAAAACCA 5880 

CACCAtlAGCT GCTTTGTACG TACGCTTGAA GCGGTACACA AAACTGTGCA GGAAACGCTT 5940 

TTTGTCGATG ACTGTCTCAT GCACGTGCGT GCcTTTATTG CGCTTGGCGG ACATGCCGTG 6000 
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CTGGTTGACG AACGTGACTG TCATGCAGAA CTGCCTCCTT CTGCACGCAT GACACGCGTA 6060 

AAAACAATTT ATGAATTGCC CGCACACCTT GCACGCCTCG CCCAAGGAGA CAATCAGTGA 6120 

GTATACATTC GTTGCAGCAG ACTTTTAGCG ACATCGTCCC GCTCCTGGAG CAGTATACGC 6180 

GCGCAGACCG CTTCATGCGG GAGGATAATT TGTTACACGA GAGAAACGAA CCTATCCGGC 6240 

GTATCGTTGA GTCCCTCGTC GCCCGCATAT TACTCCCCGG CTCCACAATG CGCGGAAATG 6300 

AGCAAATCGC ATCCTTTTTA CATAAAACCA ATGAAGGGAA ACGGGGACTC ATTCTTGCGG 6360 

AACACTACAG CT^TTTTGAC TTACCCTGTC TGCTCTACCT TATGGAACAA GGAAGTAGTG 6420 

CCGGGCGCAT GCTTTCAGAA AAAATCGTAT CTATTGCCGG TATTAAACTT CGTGAAGAAA 6480 

ATCGCATCCT GGCAATGCTC ACCGAAGgAT ATGATCACCT GGTGATATAT CCCAGTAGGA 6540 

GTTTGGCCAC CATCACTGAT GCGCACTGTC TTGCAAGAGA GACAAAGCGC AGCnGAGCAC 6600 

TGAATCGTGC AGCTATGAAG TATTTAGAGG AACTGCGCAA CGCGGGAAAG GTGATTCTCG 6660 

TGTTTCCTGC AGGGACACGC TACCGACCCG GGAGACCGGA AACAAAGCGA GGGGTGCGCG 6720 

AAGTATACTC CTACATAAAA CACGCCGAGG TACTGCTCCT TATTTCAATC AATGGGAATT 6780 

GTTTGCGCGT TGCAGAACGT TCAACTGATA TGACX3GAAGA CGCGGTGCAT CCGGATGTCG 6840 

TGCTTCTTGA AGCGCGCACT GTAGACGAcT jSCGCCCTTTT TCGAGAAAAA GCGCTGGACT 6900 

GGCACCGCAC ACACAACGTG GCGGCACCGT CAGAGGATAA AAAACAAATC GTAGTCGACT 6960 

ATCTCATGCA CCTTTTGGAA GAAATGCACG AGCACAATGA ACGAGAAAGG CTATCGTGAA 7020 

TTTTTCGCTG GAATTCCCCG TAAGATCCTA TGAGCTAGAC GGATACGGAC ACGTGAACAA 7080 

TGCGGTATAT CTCCAATATT TTGAATATGC GCGCGCCGCT TTTTTGCTCC ACATAGGGTT 7140 

CGACCTCAAA CAGTTGCACG AAGCAGGTTA CGCTTTCTAC GTAACCCAGG CGCACATTCA 7200 

CTAcCGCACT GCAGTGCATC TATTCGATAC GTT6CGCGCC CGGGTAAAAC CATTAAAGCT 7260 

CGGAAAAGCT TCCGGCGTCT TTTCACAGAC GCTGGAGAAC CAGCATCACG TGCTATGCGC 7320 

GGATGCGGAA ATTACCTGGG TGTGCGTTTC GCGCACAAGC GGCAAACCAA CTAAGATTCC 7380 

CCCCGAGTAT CTGGTACCTG CGCTGTATCC GAACTACTAG TCXTTCCCTTC TTTCCCCTTT 7440 

ACTCTCCCAA GGACATCACA CTACGGAAGG GTACGCATAC GCAGTAGGGA GGTAGGGTTT 7500 

ATCGCGGAGC CATTCTTATA GATTGTAAAA TGCAGGTGTG GTCCCGTGCT GCGTCCTGTT 7560 

TTTCCCAATA ATCCGATTTT tGTCGCGCTG GTGACGCGCG TACCTGCTGA AACCAACACC 7620 

GTCTGCAGAT GCCCATACAG GGTCTGATAC CCCGCGTGGT GCCCCACAAT CAGGTAATTA 7680 

CCATACACTG CACTGTATCX: AACCGTGCX3T ACAATCCCTC CGAGCGCCGA ATATACTGGG 7740 



Printed from Mimosa 02/03/22 07:20:32 Page: 184 



wo 98/59034 






183 








GTACCCCGCC 


GAL.TwV(-UA 1 


A iV WwiAVwA 


TT^'S'ivs A a a Ik r" 


1 X\^jH^n\^\^ 


X nnn\^\a\sn 


7800 


TCAcTACGCC 










AfST'APt^AAAP 


7860 


AAGTCACCAT 


TAATTTCCTG 


CAACGCGCGT 








7920 


ACGCGTGCAG 


GCTGCAATGG 


CTGCAcTGCG 


TV A 2k. 'h.Cn 


X aX X X X v-^w X 


\^\mJnX^ JL\9 X X X v« 


7980 


GCAGAAGAAA ACGGAAAAGG CACGCAGgAC 




V. lV9/\/\X XnX/% 


(^AAPfV^AnAA 


8040 


ACCAGCGTAC 


GCACTGAAGG 


AGGTGACTCC 


1 i l\aAAljAAt3 


1 Vjr X X 


2v a np A np Zk pp 


fti nn 

O X u u 


AATCGTTCTA 


AGGAGATCTG 


ATGCGCCGCC 


GCTATAGACG 


AAAACGTATC 


GCCGTTTTTT 


8160 


ACGGTATATA 


AAATGCCGTC 


CACTGAGGGG 


ATTTTTAGTA 


GCTGTCCAAC 


TTGGAGCGCC 




CGTtGyTGCG 


CAATTTATTC 


AAACTAATGA 


TTGCATCCTG ACTGATGTCA TAgcgCtGCG 


DO on 


CAATCCTTCC 


TACCACATCA 


CCTTCACGCA 


tTCGTACACT 


GTGTAGTACA 


GTGCAGGCTC 


o J40 


CGCATCTTCC 


TGCACX3ATAC 


GTGCACGGAG 


CAAGGAAGAC 


ACGTACCCCG 


ACGCCTGACG 


8400 


TGGTTCCTGC 


TCAGTGAGCG 


TGAGGGCAGG 


TGTCAATGGT 


TCCACCTGA6 


CACCAAAGTA 


04DU 


CGCAAGGGCA 


AGA6CAAG6A 


GCAACAGTGT 


TACGAACAGT 


AACAGTnGCC 


tACXXXSGACA 


o con 


GGTCTACACG 


GTTCTCGCAC 


AGTCTGTTTG 


GAACTTCGAC 


AGTACACGCT 


CACACCGGCT 




ATCCTTCAGG 


TGTACACACT 


GCCGTATCtG 


CGGGCAGGTT 


GCGTCT6TAC 


CTAACGCACC 




GTCTAGAGCG 


TCCACGCAC6 


CAcGCGCGCG 


CGCGAGGGAG 


TCCGGCGGAA 


AAGAAGTTAA 


o /UU 


CACCXTGCATG 


AATGCAGGGC 


TCGGTGTTGT 


CCACACCTGT 


GCAGGCAGCG 


CCTGAAGACG 


o / oU 


CGATGAAGGA 


AAACGCACAC 


ACCAAGCAAG 


CAAAAAAAAG 


nGCGTGTAAC 


GCGCATTCCC 


Bo2U 


GTGcTGACTC 


GCGCCACCGT 


ACGTGCGATC 


TCCTACCAGG 


GGGAATCCCT 


GTGCAGCGCA 


DOon 
oooU 


ATAACGGCGA 


ATCTGATGCT 


TTTTCCCcGT 


AACCGGCACA 


ATCACGCGGA 


GCACCAGCGC 




GCTATCACAG 


CTATGTAACA 


CTGTTTGCAC 


ATGCGTTACC 


TCTCCTGGAC 


gcacx:agcgt 




GCGCGCCX3CA 


GCAGCGGTGC 


gCGCAGOGGC 


GGCGGTGATC 


6CAA6ATAAA ACTTGCXK:AA 


7U0U 


TGTATGCTGC 


TGCAACGCGG 


CAGAAAACCA 


CTGGGCACCG 


CGTAACGAGC 


GCGAAAAAGC 




AATCAGTCCC 


TCTGTCCCTC 


GGTCCAAGCG 


GTGCAACGGT 


CCAGGGCGGA 


ATGACAAAGC 


Qi fin 


AGGGGGAACG 


TGCGCACGCC 


CTTGTCCCCT 


CACCCAGGCA 


TCCAGGctGC 


GCGGACCGTG 




CACAcAcAAC 


tGCGGGTTTA 


TGAAAAAAAA 


GCAAATCTTG 


TGTTTTAAAT 


ACCACCGAAG 




CAACACX3CGC 


ATTcGGTGTT 


CCAGGCATCT 


TCGAAAGACG 


ACTGGATGCT 


GCACX5CGCCC 


9360 


TACACAGGGA 


TTCAGGTAAA 


GAAAGCACAT 


CCCCCACCTG 


CACCCGCTTT 


GCAGGCTGCA 


9420 


CCGGACGACC 


ATTGAGCCGG 


ATAGCX»3TGC 


GGCGCACGCG 


GGCATACACC 


CCAACACGCG 


9480 
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GACAGGCAGG CAACAATATT CGCAAAACAC QATCTACTCG TCTACCTGCA TCGTTTTTGG 9540 

TGCAGCGAAA ACACTCAACA GCCGCTCCAC CATGGGGTCT CACGGTGGAA ACAGGAGGGA 9600 

CGACATCCAT ATGCACAGTG TGGGGAACX3T TAGACGAGAC CCACCTTTTC ACGCGACGAA 9660 

CACCTACTTT CATACACGGT GCX3TCCCCGT GCCGGATACC AGTTGCTCCT CCCAAACGTC 9720 

CTCCCCGTCT TTCCCAGTAC GACACCACAG CCCATGGCGG GTACAGCCGC CCCAGTATAG 9780 

CGCACACAAC GCTCCCTTGA CAAAGGTTTA GAGAGTATAG GAGACTGCCC CGCGATGGAC 9840 

GGTGGCTATT TTCTTGGCCA GCTGCATGCG GTGTTCAGTG GTGAAGTCTT CCTCTCTGCC 9900 

ACCTGTAGTT GGCTTGCAAG TCAGGTGATT AAAGTGGCTA TCGCATGCCG AAGtCGGCTA 9960 

TACGGTCGGT GCACGGCTTT TTTGATTTTG CTGTTTGGCG CACCGGCGGC ATGCCTTCGA 10020 

GTCACTCTGC TCTTGTGTCG GCGCTCACGC TCTCTTTTGC GCTCAAGTGC GGGTTGCATT 10080 

CGGATCTGTT CATCTTTTCC TTTTTCTCTG CCATCATTGT CGTGCGCGAC GCGCTCGGTG 10140 

TGCGCCGTTC AAGCGGCCTG CAGGCCGAGG CGCTCAATAG CCTCGGTGCG CGTGTTTCGG 10200 

AGAAACTTGA TTTTTCTTTC AGACCAGTGC GAGAGATTCA TGGACATAAA CCGCTGGAAG 10260 

TTGTCGTTGG CGTGGCAGTG GGCATCGTCA CGAGCGCTTT GTTCTACAGC TCCATGAGCC 10320 

CTTGAGTCTC CGGTGGACGT GCATGCAATG CGGcGGACCC CTCCACACAG AGGAAGAGGC 10380 

GGTGCTGTGC GCGTTCTCTC GTGTGTCCCT GCCGCGTCGG GAGGCGCAGA CCTTTTCTGT 10440 

ACCGTACAGA GGGCACACCA ATGATAGAGC GCCTACGGAG CAGTCGCGGG AAACTCACCC 10500 

TCACCCACCA GATTTTCCCC CTCAGCTTTG GGGGGAATGC TTTTTTGCCT GCGCGCXXX3C 10560 

TCGTTCCGTT CTCCGTTGAT GCTGGAGAGC CAGCCGCCGT CGCTGTGGTA AAGGTTGGGG 10620 

ATACGGTCCG AGAAGGTCAG CT6ATCGCAC GCGCCGCGCA CGCCGGTGCT GcTCACGCAC 10680 

ATGCCTCCGT CCCCGGTGTC GTCACXTCGCT TGGTAAGTGC TAATTTTCTC GCCGGTAGT6 10740 

CCCTGCGCGC TGTCGAGATT CGTACACGCG GTTCCTTCGA ACATCTTGGC AAGGTCCAAC 10800 

CAAATCGCCC GTGGCAGCAC AGCACCGCTT CAGAATTGct GCGCCTAGTT ACAGAOXSCAG 10860 

GAGTAGTGGC CACACGCCTA CATCCGCACG CCCAGATCAC GA6CACCGCA ACX3GGCACGC 10920 

ACGCGGGTGC ACAGCACACG TACGCGAAAG ACTACGGACA GAAGAGAAOG GCTGAAGCGC 10980 

ACACGCTGCG TCTCATGCGC GCGGCGTGGG AAAGCGGCAA TGCGCTCGCC ACGCACCTCC 11040 

ACCTGCACX3T GCGTAAGGGT GTACGGAAAC TTACGCTCTA CCTTTGTGAC GACGACGCTA 11100 

CCTGCCCTTT GAGTTCGTTC CTTGCGCAGG AGTTTCCAGA ACCTGTTGCT ACCGGTACCG 11160 

CCATTATTGC ACGGATACTG GACGCTACGT ATACCcGCGT GTCTCCACAC GCTGCCAAAA 11220 
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CGCTCCCCCG GTCTTGCAAG GATGCGCGCT 
TATAGACGAC ATTATCCTTT TAGCAATCTA 
ATCGATGCAC TCACTGCAGT GCACGTGTAT 
AGTTCCTACA TTGCTCTGAC AGGCGCTGGA 
ATCGGCACCC CCCTTGGCGC GCTCATCGAG 
CATCTCATCA TCAATGGACT GCTCAAGGGT 
TCAAAGGGGA TCAAATCGCT CCACGTCACC 
ACCTCCTGTC AAAACTGTGG TGATTGCGCG 
AAAATTGCGC GTGCCGCACA CCGTAATCAG 
ATTTGCCACC AATGCGGTCT GTGTTCTGCC 
CTTTTGCACG ATGCACAAGA ACGCGCACTG 
GAACCCCACT CCACACAAAG CX3TCGGGAAA 
CGCTGAGTAC AAACACGCAC CCTTCCTTTA 
TGTACTGTTG GCGCTGCTTG TTGCGCACGT 
CGcGCTTTTT TCCATCGTCA GTACCGAACT 
AcTACGCACA CCACATGTGC CCCTGAGCGA 
AGTACTCCCC GCACACAAcT CTTTTTTGAA 
TTTTACGCGC GTTTTGTTCG GTGGCAAAAT 
CCCTGTCCTG CTCCGTCTGT GCACGGAGGG 
TGTTGTACAG GGAGCGATGT CTTATCCTCT 
CGCCGTGCGT ACGTGGTGCA ATACGCAGGT 
GGGAGCGTTG AGCGCCTGTG TGTTCACTCA 
ACTTACCCTT CTTGCTGCAC TGTGTGTATA 
GTGCGCGTTC CTTGTGGTGT ACAGCACACT 
AmCCCTTGTT TCCCTCATAA AAAGCGGCGC 
GCCAGATACG TCAATGCGCA CAAATGGCGG 
GTGCGCGTTT TTTCTTGCAA AAAAGAATAG 
CATGCACTTG TGGAGTGCGA TACTACTCAC 
AGAATCCTGG TACTACTATG TGCGGAGGCG 
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GTCTTTCCAT TCAACGAGAT GCACGACGCG 11280 

TGTGCCCCAC GCTATCGTGC AGGTTGCACA 11340 

GAGGCAGTGG TACTCAGTCA GCCGCAAATC 11400 

TTAAAATCAC CGCAGGTACT CCGCGCGCGT 11460 

GAGTGTGGAG GGTTTCGCAC ACGCCCCGGG 11520 

AGTGTTTTAG AGTCGTTGGA CCTGCCTTTC 11580 

GGTAAAGCGC TTTCAAGiCTC TGCGTCCTGT 11640 

CGCATTTGCC CAGTATATCT TGACCCAATA 11700 

TTTACTGAAG AAGTGCTCCA ATCCcTGcGG 11760 

GCCTGTACTG CGCGTATTCC TCTTGCAAAA 11820 

CATCTTTCCC GTGCTCCAGT CACCAAAATA 11880 

ACTATCCGCG AGGCACCTGC CAATGCGCAC 11940 

CACCGGCTTA AGTGCTGGAC AGAACAACAG 12000 

GTTCGTCGTT GCAGCcaTkc gCGACACGGT 12060 

CGGCGCACTG AGCGCCGCGC TCGTTCAAAC 12120 

CTCTCTCGTA CTGGGCCTCC TCATCGGTGC 12180 

CACATTTTGT GTCGCGTTCT GTGCCgTATT 12240 

CGGGAATTGG CTCAACXTCCA TAGCGCTTGC 12300 

AACTTCCCTC CCAACGTCTG GGCGTGTCTC 12360 

TTTCTATTCT GCGCTTGTCG AGTGGGACGC 12420 

GTTCCAACCA CTTGGTCTTA CCCTCCCTGA 12480 

GGCTGCAGCG CCTGGGTTTC GCTATCCAGT 12540 

CGCAgTGCGG GCGCGAOGCT ACATCTGTTC 12600 

GTTTTTTTTa CCCGCACACG CACACCCTGC 12660 

GCTGTTTACT GCATTCTTTG TACTCCCTGA 12720 

GGCTTGGATC TCAGGGGGAC TCTGTGcTAT 12780 

TTCTGCCCCA GATATGTGGG GTGCACACGA 12840 

CAACATCGTA CAGCCACTCA TTCTACGCGC 12900 

TCGCTATGAC GTACAACACT AACACGAGTC 12960 
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TTTCATCCTA CGCAGgATTG AGCGCATTTG CGTTGTCAGT CTTTTGCATT 
CCGCGCGCAC TGGTTCTTTT TTAAAAGAAA AGGCGCTCAT CACyTGCGCC 
TTGCAAGGCA AGCCCCAGAA CTTGGGGTCA CGTCACGCAC CCTGCGCATG 
CCCCCATACC GCAGgCTGAG GTGCTTCGGG GAAAAAAGAA TACXKX3AGAG 
TATACTTTTT CCCACTCAGG GGAATGTACG GTTCGTTTCC TACCXTTTTTT 
AAAAAGATGG TGCaCGCTTT TGCCaTCTCA TAGGTAATCA CCCTACACCG 
GCTTTTATGG CATATCGAgT tGCGCGCATC GCkyTTCAGT GTAGAAAAAT 
CATCAAACAG TCGCATATGA GTAAGTACAC GGTTAAGCGC GCGAGTGTAT 
TGGCATAGGA CTATTTGTTC CTGCAACCGG AACCTTTGCc TGCGGTCTAC 
TGGCTTTTGG GTTCTATTTT TTTCCTCGCT GCTGGCGAGA TTTCTCTCAC 
GCGCACGCGC AGCgcTCCTT TGTTCGAGGT CTGTCTTACC CTCTCAGCCA 
TGACAACTTG ATCCAAGGCT TTTTCCCGCT TGTGCGTATG ATGCTGTGTC 
CATTAinCsCG CTTTCGCGCA CACTCGATCT CTGTCTTACC GCATACGATG 
ATCGCTCGAA TGCGTAGGTG TCTTCGGCAT CATGATTGCG GGAATTTCTC 
ATTAGTTGCC TTCGGGTGCG TTTCGCTACC GGCCCCGTCG GGGTTCTTGC 
TTTTCCACCC AGCAAT6TAA TACGCTTTGC AGCCACCGGC GCAGGGACCC 
TGGTATTGTT CTTTGGATAT TCCGCA6TGC . AGGTAACGAC CACACGCCCT 
TGAATGGTGA CAATGGTGCC ACCCCTGTTT TTCGTATGCG CCCTTTTCTT 
ATCGGATTAG ATCGCCTGGT AnC 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14244 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



CTATGGGGCA 
GCAGATATCC 
GTACCGAGCT 
GAAATATTCC 
TTGTACGATA 
CGTGATGCAC 
AGAACACCTC 
TGTGCATTTT 
TACTCGTACT 
AGTTTTTTAT 
CCATTATGTA 
CTTACCTTTT 
CAGATGCCGA 
TTGTACGTGA 
GCATCATCTC 
TCATAAGCTG 
CTTTAAGGAG 
TGCCGAGGGC 
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13020 
13080 
13140 
13200 
13260 
13320 
13380 
13440 
13500 
13560 
13620 
13680 
13740 
13800 
13860 
13920 
13980 
14040 
14063 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

CTCTGCTTGC CCCCTATGAG AAGACGGAGG CGCTTTCTCA CTCTTTGCGC GTGTGCTGTG 60 

CACCTTCTTC CTCTTTTCCC TCAGACGATT ACAACCGCTT CTGTCTTTTT CGCCTCAGTT 120 

TCTGAGCATT ATGCCCGGTT AAAGGATTAC GCTGCTGATT TGGCCATGAG CACCGGGTCA 180 

GGAACCCGCG CGCACCTTAT GCGCGCAAAG GTTATTTTTA AATATCCAGA CCGTCTGCGT 240 
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TTGGATTTCT 


CAAGCCCTGC 


T6AACAAACT 


ATTGTCTTCA 


CGGGAGATAG 


CCTGACCATC 


300 


TACTTGCCCA 


CCTCCCGCGT 


CGCGCTTGTA 


CAATCGGTAG 


CAAAAGATGA 


CACAGTAAGT 


360 


GCTGCTTCTC 


TAGCTTCGCC 


TCATGGTCTT 


GCGCTTATGA 


AGCGGTTCTA 


CACGATA.GCC 


420 


TACGAGACGA 


GTTCTTCTCC 


TGTTCCCCTG 


GGTCCGGACA 


GTGGGGAGAT 


GGtCGTTGCA 


480 


CTGGTGCTCA 


ATCGTAAGTC 


TGCAGCAGAA 


ACATTTAAAT 


CTTTGCGCGT 


GCTTGTCTCG 


540 


GCACATACCA 


AGCTTATCCG 


TCGCATTGAA 


GCGTGGCCTC 


TTTCGGGGGA 


AAAAATAACA 


600 


TTTGATTTCA 


GCCACTATCG 


TTTGAACGTC 


GGTATTCCAG 


ACACACGGTT 


CCTCTACGAT 


660 


GTGCCCCCAA 


CCGCAAATGT 


GGTGCACAAT 


TTTCTCTTTG 


CTGATTGACC 


GCTGCCCCCA 


720 


AAGGACGTGA 


CGATGCCAGA 


TATTGGAGAG 


CTGCTAAAGA 


CGACGCGCGA 


ACGCAAACAC 


780 


CTCAGTCTCG 


AACAGGtGCG 


CACGAGACGA 


GTATCGCACG 


CCGTTACCTG 


GAGGCGCTCG 


840 


AGAACGATGA 


GTATGATGTT 


TTTCCCGGCG 


AACCCTACAT 


CCTTGGCTTT 


TTGCGCAATT 


900 


ACTGCGAGTA 


CCTCCAGCTG 


GATACGGAGC 


AGTGCATCGC 


TCGCTATAAA 


CATTTAAAAA 


960 


TTCAAGAAAT 


GTCGCTGCCA 


ACGGAGACCC 


TCCTACCGAG 


TAAACGGTGG 


GGTTCATTTC 


1020 


CCCTGtTAAA 


rGGAGTTGCC 


TGTGTGCTCT 


TCCTGGGTGG 


GGTGCTGGGT 


GTGTATTACG 


1080 


CGCGGCACCG 


CnCnTnGGGT 


TTTCTATCCC 


GtATTGTGTT 


CTTTGGCAGA 


GCACAGCGTA 


1140 


CCCCAAGGGA 


GCTGTCTCCC 


CCCX3ATGCAA 


CX3GGGGCGGT 


GCGCGAAACA 


GTGTCGCTGT 


1200 


CTTCTGCACA 


ACATGAGGAG 


CGTGCGCGAC 


GCACCGTATA 


CGAGCGCATC 


TCGCTATACG 


1260 


CTTGCTGAGG 


AAAAGTTTGA 


ACACACGGTC 


TTTCCAGGAG 


ATGTGTTGGT 


TATCAGTTCC 


1320 


GGGGGGAATG 


CGTACGAGCT 


AACCGTCAGC 


CGCACTACGC 


CGCACCTGTA 


TCTGGACACG 


1380 


CCCATTGGTA 


CACAGGTGAT 


CTCTCTTGGT 


CAGCGCCtAG 


TGATGGATTT 


GAATACAGAT 


1440 


GTGCAGCCGG 


ACGTAGAAAT 


AAGTGTGGAA 


GACATTGAAG 


CACATCAGGC 


GGACGGGGGC 


1500 


GCGCkTGTTC 


GCGTGTTTAC 


AGGTaGTCTG 


GTGCAGACGC 


TCCGTGAtCG CAgTGCTCAG 


1560 


AGCTTTGTGC 


CTACAAGOXSG 


GGTAAATGTC 


TCTGGTCAGA 


CGGGAGTCGC 


TGCCGGCGCG 


1620 


CGATATCAAG 


TTTTGTTTGA 


AGGCGGTGTT 


GCGTACCCGG 


TGACAATGAA 


CGCAACGTTT 


1680 


CGCTCGTACT 


GTTTGTTCCG 


GTACGAAGCA 


GATCGCACGC 


GGCGGGAGGA 


GCGGTATTAC 


1740 


CAAAAGGGCG 


AGCAGCTGAC 


GGTGCAAGCA 


AACAACGGGA 


TTCGGGTGTG 


GGCATCTAAC 


1800 


GGGAATGTGG 


TGCAGCTGCA 


AATTGTCGCA 


GGCGGTAAGA 


CGGTGGATGT 


AGGCCTCAGC 


1860 


CGTCCGGGGG 


AAGTGCTGGT 


CAAAGACATC 


AAATGGATCA 


AAGATGAGGA 


CGCCGGGCGG 


1920 


TTCAAGTTCG 


TGGTCATGGA 


AGTAGACTAG 


CGCGCGGCGG 


CAGCAATCGC 


GTACGCkTTC 


1980 
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Pi 




CAGAGCGCGT GGACTGCAGT GCACAGTGCX3 CTTGCgCGCG CGCGGGAGCC GCTTCTTTTT 
TTCTCTCTTA CAAAAAGTAC CCGTAgCXSCT GCGCCCGCAG CTcCTGCAAA CAGCGTGGcG 
CTGCcTGCGG GCCX3GTGTGC AAGAGCAAAG AGAAGGACTG ACAGTACCTC gCCACAGGcG 
CGTGCAGTGC AGGAAGTGGC ATGGTGGCAC AGACGCTCAG GTATATAGGC GCGAAAGAGC 
ACTTCTTCGC TCAGAGCATT TAAAAAAAGC CGTACATAAA AAGCTCCCCC TTCCCCTTCT 
GGGAAG66AA ACGGTGAGGG AAGAGAAAAA CAGAAGGAAG GAAATACTAG CGTGCTGAGT 
ACGAACGCGT ATTCGGTAAC AGCCGCCGCA GCXSTGnCAGT ACGTGTGGCG TGTTGCGCAA 
AGGGGAGAGG TGCATCTIGCG TGGAACAGTG TAACAGAATC GGGCAGAGGG GGTACAGAGC 
GCATATATCC CCT6CAGTGG TGATGGCCAT TGCTGCGGTG GGAGGTTTTT ATGGGACTCA 
CGTGTATGAG GTACCX3TTTC CGTATGCATT CTTTGGTGCA GTACAGGCGT GTGTGCTGTG 
TATTGGGTGT TTGTTGGTCC GCAGTGGTGT GCGGTTCTTT TCTCGTTGGG GTGCTGTCCG 
TATCTGGAGG AGGTGGGGAA TCGCATACAC CAGCGTATGT CGGTGTTGTA ATACGCTTTT 
TTTCGTGTTC TGTGGTCTGT GTGTTGCCTG CGTTGCGCGA ACCTCCCTCA TGGTACAACA 
AGCTCCGTTG CAAACACTTG CACAACCCCA AAAACTACGC GTTTTGACTA TACACCTTTT 
GCaAGAGCCA AAGCCTGCAG GCaCGCGCTT TCGTGTTCGG GCGCGCGTAT TGGGTGCAGG 
TTACATAGAC GGTGCTTCCT TTTCTGCACG TGGGGTGTGC ACTGTATTAT TTCCTGCAGA 
GGTAATTTTG CAGCA6TACG CTACCGATAT GACGGACGAC gcGGATGCCC GCX3TCTGTCA 
GTATTACGCG CGTGGGTTGC GCTGTCAGAT TCGTGGGCGC TTTGCATCTT CTCCACCGAA 
GCTTTTTATC AGTAGTTCTA CACCACCACG CTTTGTTGGC TGGA6TTCCT ATTTTGCACA 
GATGCGCGCA CAGATGCGGG TTGCACTCAT GAGGTTTTTA TCTCCATGGG GGCGTGCAGG 
GGGATTGTTA CTCGCGCTCC TTTCTGCAGA TAGTGTTTTT CTTTCGGATG AAATGCGTGT 
CGCGTTTCGC CATGCAGGAC TTGCTCACGT GTTGGCACTC TCTGGCATGC ACTTGTCTTT 
GGTAGGGGCX3 AGTGCAACGT tTTTGGGCCG TTTCATCGGC ACAAGGCACA GAGGTATGCA 
GGGGGCGTTT TTTGCGATGC TTGTCTTTGT GTGGTTTGCA GGTATATCGC CTTCCCTTGC 
GCGTGCACTT GGTATGACTT TAGTGCTGAT GGGAGGACAG ATGGCATACG TGCX3CX3TAGG 
ACTTTTTTCT GTACTGTGTG CTGTACTTAG CATACATATG CTCATTGCX3C CGCATGATGT 
ACAGACGTTA AGTTTCATGT TGTCATACGG AGCGCTTGCA GGTATTGTGT TGCTTGGCTC 
TGAGATTACT GAAATGATGT CGGGTTTGAT TCCTCGGCCA CTTGCATCGC TGCTTGGAAC 
GTCCTGTAGT GCGCAGTTTT TTACAGCACC GATAGTGCTT TCGGTCATTG GATATTTTGC 



2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
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CCCCATTGGG GTACTTGCCT CGTGTGTGGT 
GGGGAGCGTG GCGCTGTGCT GCTCTTTGGC 
GGGTGTGTAC TTTTTTGGTG AAGGACTCTG 
GCTTGTGTAT GTACAGAGTG CCTGCGGACA 
CGGTGGGGGA CTACTAGAGG CGGCGCGTCG 
GCCCGAATTA TAATTCGGCG CGTGCACTTG 
TGCATAAAAA GTGGGGGCAG AATTTTCTGC 
AGATATTGGC GCCGGAGCGT GGGGAACGTG 
TGACCGCACT TTTGGTGCAA AACAGTGATT 
TTGTGCAGAC ATTGCGCAAA CTTTTTGATG 
TGCAACAGTG GCATGCTGCA GCAGCACAGG 
CCTACAATAT TGCTGCCCGT TTTATTGGAA 
GTATGGTGGT GACCGTTCAA AAAGAAATCG 
AATGGTATTC ATACTTTTCA GTACTCTGTC 
ACGTTGCGCC TGTCTGTTTT TGGCCGCGTC 
CCAAGCX5TAA TGCGGTGCCT TCTTGTGTGG 
CTTTGTTTTC TGCGCGGCGT AAAACGGTAA 
TGCCAGGCGG TGCAGCTGTG TGTGTAGAAG 
GTGCGClcTGC AGAGCAACTG AGCATCTATG 
CGCTACTGTA GTCCGGTGTG GGTGTTGAAT 
GTTTTTTGTT TTTCCGCTCT TTTCTGAAGA 
GCCTGTGCCC TATGAGGACA CAGAATTTTC 
AGCGCTGTCC ATCGGTGCAT TCCCGATAGT 
CATACGTCTT ATTCAGCAAT GGTCGACAAA 
TGGCGCGGAc TAcCGCCAcT GAGTACGAAG 
GGGATTTCTG TGACGATTGG ATTAATTGAC 
CACCGGCGTA GTCTTGcaGC GTTCGCAGTT 
TGATTCTTTT GTTGAGGGGA CTGACGATAG 
TGCAGTGCAG AAAGCACGTC GGTGTGCAGT 
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TAGTCCGCTT ATCGCCTTAT 
AGTGCCTGCT GTTGCGCCTT 
TGCGGTTGTG CGTTTTTTTG 
TGTGTGTGCT GCATTATTTT 
CGTGCGTGTT CACAAGGATA 
CACAGTTTTT GACGGAGCGC 
TCGATCCGGT GTTACGTACG 
TATGGGAAAT tGGTGCAGGC 
TTTTAACAGT GTTTGAAATT 
CACACGTCCG TGTGATAGAA 
AACAACCTGC GTGTGTTCTA 
ACACGATCGA ATCAGGCTAT 
GGTTGAGAAT GACTGCGCTC 
AGTGGCAGTA TGAAGTGCGT 
CTCATGTAGT TTCTCAAGCA 
ATCCTGCGCT TTTTCTGCAC 
GAAATAATTy ACTCACGTGG 
AACTCTGCGC ACGTGCAGGT 
ATTTTATTAC GCTTTCTGaT 
GGCX3CGTGTC TATATTCTT T 
CGCCGCGCGC GATGTGGAAC 
CTTATGGCAG AAAGAATTGT 
AACGCTGCTC TCTTTTATCA 
GCCTCCGACA TGGTGGGCGC 
GAGCGCGCGA TAGTTTTTGG 
GTGACGTATC GTGCAGTGAA 
AGTACCAGAC CCGATAGAAC 
CACGTGAAGG TGCACAGGGT 
GTTCATTTTT CCGTTTTTAG 
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TTTTGATAGG 3780 

TTTTAAGTTG 3840 

CGTGTGCGCC 3900 

CTTTTTTACT 3960 

CATATGTGTT 4020 

GGTTTGCGGA 4080 

CAGCTTGTTA 4140 

ATTGGTGCGA 4200 

GATCGCGGCT 4260 

GGGGATGTGT 4320 

GGAAATTTAC 4380 

ATTTTTAAGC 4440 

CCTGCACAAA 4500 

GTGATTCGTA 4560 

TTGGTACTCA 4620 

GTGACGAAAA 4680 

CAAAAAAGGA 4740 

ATTGACGCGC 4800 

aCgctGCGCG 4860 

TTTTCAGTGT 4920 

CTAGCGATGC 4980 

ATCGTTTTGA 5040 

CGTATGACAT 5100 

TGATTATTCC 5160 

TGTGGCAGTG 5220 

GCGTGCAATA 5280 

TGGTGCCACT 5340 

GTCGTATTCT 5400 

AATACCGGGC 5460 
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GCGACGTGTC GGTGTGATGT GTTCTGCGCA GAACATGTTT TTTTTTGTGC AC6ATCTCAG 
CTCAAGAGGA TCGTGCGTGC ATTGTGTGTG AATGGGCATA CGGCAAAGTT TTCAAGACCT 
CTTCACGTGC GCGATCGGGT GTCTTTTGAG TGGGTACGCT CAGT6CCCCC GGCGCTCATT 
CCTGAGAATA TATCGCTTTC TATTCTGTTT GAAAACGAAG ACATTATTGC GGTGAACAAA 
GCGCAGGGCA TGATAGTACA TCCTGGGGCA GGCCACTGGA CGGGAACACT TGTTCAGGCG 
CTCAGTTTCT ACCGGGTGTA TCGTGCACGT TTTGAGGATG AGTTTTCTCG TCAATTTCAG 
AAAGGATTTC CCGATTTTTT CAGTACCCTG cGTCAGGGTA TTGTGCACCG TTTGGATAAA 
GATACATCGG GCGTACTCCT CACTTCGCGC AACATGCATG CTCATGAGGC ACTTGTACGT 
TCGTTTAAAA AAAGACAAGT AAGAAAAGTA TATCTTGCGT TATTGCAGGG TGTTCCTGCA 
CGCGGGGTTG GGGTGATTGA AACAACAATC GTGCGAGATA GAAGACGACX3 CACGCGGTTT 
GTTGCGTCTG AAGATTTTTC AAAAGGAAAG TACGCACGTA CGCGATACAA GGT6ATGAAA 
ATATGTGGGG CGTGCGCTTT TGTCCAGTTT CTATTGGATA CTGGTCGTAC CCATCAGATA 
CGTGTGCACG CGCGATACCT AG6ATGTCCC GTTGTAGGAG ATCCX5TTGTA TGGTTCCCGG 
AATATCTGTG GCATACCCAC AACACTCATG CTTCATGCGT ACGCAGTACG GTTTGTTCTT 
CCGAGAACGA AAAAACGCAT AACGCTGGTA GC6CCCATAC CGCTTCGTTT TGTTCGACT6 
ATACACCGAT TATCGGTTAG GTAGGGTGTG GCAGGTGCGT GCGTATATGC GTTTTACTTC 
AGCAGCTAAA TAGAAAGAAC CGATGGCAAG GAGCGCTTTA CGTTGCGCAA AACTGGCATA 
CAGTGCACGG GAAATAATTG ACGCAAAGTC TTCGCTCCAA AAAATTGGGA CTGTTTGGTG 
AAATGTCGTA CGAAACGCGT GGTATGTTTT TTGTATATCT GCATGTTTAG ATGTGCCCGG 
TATGGTTAAA AAAATTTCGC TGGCGGCATG TGAAAAAAGA GGGGGGAACT GGGACACTGC 
TTTGTCCGCC GCGCACGCAA AGAGTAAAAT GTATTGTGCA GAAGGTAGTA AAGAAGAGAA 
CGTACGACAT GCGCACCGTA TACTCTGcGT AgTGTGCGCA CCGTCAATCA CTATGaGTGG 
ATCTTCCTGc ATAATTTcAA AGCGTGcTGG cACGTATGCA CGGGaCAGTC CCCGCTCGAT 
TAACGTTTCG CTCACGGTAG GAAATAAATA TTTT6CCX3CG CACGCAGCCA GTGCTGCATT 
TTTTGCXrrGA ACAATATCGC ATAACGCGAG TGTGCAGTGT ATATTtCGAG CGAATAATCT 
GCCAACAGGA TGCGCGGCGT TAAAACTGAG AGTTGCaGTG TGTGTGAAGT GTTTTATtGA 
ACTTTCAATA TGTGTGACCA TATCtGGTAA GTAAAAGAAG GGAGCATGTT TTTCTCGCGC 
GATATGTTTA AAAACGTGCA ATGCATCTTC TGGCTGATCA AAACAAAAAA TAGGCGTATA 
GGGTTTGATA ATGCCGCCCT TTTCTTTTGC AATACTTTTT ATACGTGTTC CTAATATGCG 



5520 
5580 
5640 
5700 
5760 
5820 
5880 
5940 
6000 
6060 
6120 
6180 
6240 
6300 
6360 
6420 
6480 
6540 
6600 
6660 
6720 
6780 
6840 
6900 
6960 
7020 
7080 
7140 
7200 
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CGTGTGTTCT TGTTCTATGG GGAGAAGGAG ACAGATACTA GGACAAATGA TGTTTGTTGC 7260 

ATCTAGTCTT CCTCCAAGTC CTACTTCAAA AACGGACCAT TCCATGCGTT GTTGTGCAAA 7320 

TAGCATGAAC GCCAGTAGOG TTATAAGCTC AAACCACGTC GCCTGGCCGT AGTCGCGCAG 7380 

ATTCTCTGTT TTTTTCACCG TGTGGTATAC GTGTGTGCAC GCGCTTGCAT ACTCGGCAGG 7440 

TGAAAAAAAC ACACCCGCGC GTGTTATTCT CTCTCTCGGA TCCATAACGT GAGGAGAAGC 7500 

GTATAGCCCG GTGTTGAATC CAATTTCATT GAGTATCGCT GCAAgCATAC 6TGCGCTGGA 7560 

ACCTTTTCCC TTCGTGCCCG CAACATGGAT GCTCTGATAT GCGTTGTGTG GATTACAAAG 7620 

CGCGCGTGCA AGTGCAGTCA TCCTGTGCAG AGGTGGTGT6 CCGCTTGGGG GCATTTTCTC 7680 

AAGCGTGCGA ATGCGCTCAA CCCAGGCGTA AAAATCTTGA AAAGAATGCA CCGGTATATG 7740 

TGAAAnTCCG TGTGCGCTCG GTCGCACTAT AATATGCGGT ACGGAAGAGG CAGCAATCCT 7800 

TGCCGGGAAA GAGAACTGAT GTACATTGCT AAGGTCTGAC ATTTGAGCTA AAATCCGCCC 7 860 

ATGAAGCGGG GGACTCTACC AAAAGATGTG TCAGGTATCA AGATTCACAT GATTGGTATC 7920 

AAGGGCACTG GCATGTCTGC GCTTGCAGAG CTACTGTGTG CACGGGGTGC CCGTGTGTCA 7980 

GGTAGTGATG TTGCAGATGT GTTTTACACG GATAGGATTC TCGCCCX3TTT GGGTGTTCCC 8040 

GTGCGTACTC CCTTTTCTTG CCAGAACCTT GCTGACGCTC CCGATGTGGT TATCCACTCT 8100 

GCAGCCTATG TGCCTGAAGA AAACGACGAG TTGGCAGAGG CGTACCX3GCG GGGTATTCCT 8160 

ACCCTTACCT ACCCAGAAGC GCTGGGGGAC ATTTCCTGTG CGCX3GTTTTC GTGTGGTATT 8220 

GCAGGTGTTC ATGGAAAGAC GACCACGACC GCGATGATTG CTCAAATGGT AAAGGAGCTG 8280 

CGCCTTGATG CGTCCGTCCT TGTGGGGAGC GCTGTTTCGG GAAACAATGA TTCTTGTGTG 8340 

GTTCTTAACG GAGATACCTT TTTTATCGCA GAAACGTGCX5 AGTACCGTCG GCATTTCCTG 8400 

CATTTTCATC CTCAAAAGAT TGTCCTCACC AGTGTTGAGC . ACGATCACCA GGATTATTAC 8460 

TCCTCGTACG AGGATATACT CGCGGCATAC TTTCATtACA TAGATAGGCT TCCTCAATTT 8520 

GGTGAGTTAT TTTATTGCGT GGAT6ACCAG 6GCGTGCGGG AGGTAGTGCa GCTTGCGTTT 8580 

TTCAGTAGAC CGGACCTGGT GTATGTTCCT TATGGGGAAC GTGCGTGGGG CGATTATGGG 8640 

GTCAGTATTC ACGGTGTTCA AGACCGGAAG ATAAGCTTCT CATTGCGGGG TTTTGCAGGT 8700 

GAGTTTTATG TTGC6CTCCC CGGTGAGCaT AGTGTTTTGA ATGCAACCX3G TGCGCTCGCA 8760 

TTAGCACTGA GTTTAGTGAA GAAGCAGTAT GGAGAGGTTA CCGTTGAGCA CCTCAcGCTC 8820 

TGCgGAAGGT ACTCGCTCTT TTTCAGGGAT GCCGGCGAAG GAGTGAAGTT CTTGGGGAAG 8880 

TGCGCGGTAT TTTGTTCATG GACGATTATG GACATCATCC GACTGCAATT AAAAAGaCTC 8940 
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CGCGGGTTAA AAACGTTCTT TCCGGAAAGA AGAATTGTCG TCGATTTTAT GTCCCATACA 9000 

TATTCGcgTA CCGCAGCCCT CCTCACCGAA TTTGCTGAGT CTTTTCAGGA TGCGGATGTA 9060 

GTTATTTTGC ATGAGATTTA CGCCTCTGCT CGGGAA6TGT ATCAGGGCGA GGTGAACGGT 9120 

GAACATCTTT TTGAATTAAC TAAACGGAAG CACCGGCGGG TGTATTATTA CGAGGCTGTC 9180 

ATGCAGGCAG TGCCTTTTTT GCAGGCTGAA TTGAAAGAGG GCGACCTGTT CGTTACGCTC 9240 

GGCX3CTGGAG ACAATTGCAA ATTGGGTGAG GTGTTGTTCA ATTATTTTAA AGAGGAGGTG 9300 

TAAAGTTCGG TTGCGGTTTC GCCAACATGG TGGTGGTGCC gGCTGTGGAT CTGGTGGATA 9360 

TAGGGTGAAG TGAGACAGGC TGCGAATGGA TGGTGATGCA AAGAGCGGAG TGCGGAGGGG 9420 

TGCGTGAGTA ATCGGTGCGA TGTGTCTGGA AATAAGGCGG TACgCATAGC AGTTTCAGGC 9480 

GCGTCAGGGT GTGGTAATAC CACCGTGTCT GCATTGCTTG CGGAAAGACT GGGACTTCCC 9540 

CTA6TGAATT ATACGTTTAG GAATATTGCC CGGGAGTTGG GTATCTCTCT TAGTGAGGTG 9600 

CTCGAGCGTG CGCGGACGGA TAATCATTTT GATAAAGCAG TTGATGCGCG GCAGCTCTGT 9660 

CTTGCGATGC GTTCTTCCTG CGTGGTAGGG TCGCGCCTGG CCATTTGGTT GGTGAAAGAT 9720 

GCCGCGCTGA AGGTATATCT TTTGGCTTCA TTAAAAGAGC GGGTGAAACG TGTTCTCCAA 9780 

AGGGAGGGAr GGGACGTACA GGATGTTGAG CGATTCACGT CTATGCGTGA CX3CTGAAGAT 9840 

ATGAGTCGCT ACAAAAAGTT GTATCX3TATT GATAACACGA ATTACAGTTT TGCAGATCTT 9900 

GTTCTAAACA CAGAAGGGTG CGATCAAGAA ACAGTGGTGA GTATTATTAT TGAAATGTTA 9960 

CX5CGCTAGAG GGATAGCTTG GTAGGGCTGA GCCAATCTGC GGGTGATATA QAAAAGTTTC 10020 

AAAACGCCAT ATTGGATTTT TATGCACAGC AGGGCAGGGA TTTTCCGTGG AGAAGTACTT 10080 

GCGACGCGTA TGnaTACTGG TGTCTGAGTT TATGTTACAA CAGACACAGA CGGAGCGGGT 10140 

GTGTCCGAAG TATGCAGAAT GGCTTCATCG TTTTCCTTCT TTGGAGTCTC TTGCGTGCGC 10200 

TCCATTTGCG CACGTGCTCC AAGCGTGGAT TGGATTAGGA TACAACAGGC GCGCTCGTTT 10260 

TTTGCATCAG TCGGCAAAAC TCATTGTTGA AAGGTATTGT GCAGTAGTTC CTGATGACCC 10320 

GAGTGAACTA AAGAAGCTCC CCGGTGTCGG TGACTATACT GCCX3CTGCAG TTGCTTGCTT 10380 

TGCGTACAAT AAGGCCACCG TGTTTTTAGA AACAAACATC CGTGCAGTGT TTATACGCTT 10440 

TTTCTTTCCC GATACGCACC AGGTCAGTGA TCGGGAGTTG CTCTCGCTGG TCCGGTGCAC 10500 

CCTGTATGAG GAAAATCCTC GGCGTTGGTA CTACGCACTG ATGGATTATG GGGCAGTTCT 10560 

AAT^GGAAG ATTACAAATC CTAATCGTCG CAGCAAGCAT TACGTGAAGC AGTCACCGTT 10620 

TGAAGGTTCT CTGAGGCAGG TGCGTGGAGC GGTTTTAAGA GAGATAAGCG GCATGCAACA 10680 
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CGCGGTGCGC GAGAAAACGC TTTtCGCAAA GCTGTCCTTt GAGCACGAAA GATTGAGCCG 
CGCTCTAGAC TCGCTGGTAA GCGAGGGACT GGTAGTAAAA ACAGAGGCTG GGTATTCCAT 
CGCTGATTGA TTCTTTATGA CTCAAGACGC TTGAGTATTT CACAAATAAA GATGCCTTTC 
TCTTTTATTT CAATGGCATC TGATGTGAGG ATCAGCATGT TGGGCTGCTT TGCGTCAAGG 
CGAACGCGAT CGTTGGAGGT CTGTACCAGG CGCaGTATCT TTTTGAAGGC AGTGTCGCAG 
ACCTTGTCAA ACTCAATCAA TAAGCTTTCG TGTGTTTCTT TGAGTGAGAG AATGGCGAGC 
TTTTCGCACC GCACTTTGAT TTCXGCCACG GTAAACAACC CAGCTGCTTC CTCaGGGATA 
GGACCGAACC GGGTGATAGT TTCCX3TGCGT ATGCGCTCAA GCTCCTCATG CGTATGAGCT 
GCAGCGATTT TTTTATACAG TTCCATTTTA ATTTCATCTG CGGCAATGTA CGTATGGGGG 
ATGAACCCTC GGTAATTAAG ATCGATGACG GTTTCTATCC TTTGCTCGTT TGGAGCATGT 
TGGAGGCGTT CTATTGCCTC TTCTAACAGC TGTACATACA GGTCGAATCC GACTGAATAG 
ATATCTCCTG aTTGTTCTTT GCCTAATAGA TTTCCTACCC CGCGAATCTC CATATCTTTT 
AAGGCGACTT TGAAACCCGC CCCAAGGTCA GTAAAGTCAG AGATCACCTG TAAACGTTTT 
ATTGCAAGGT CTGAAAGTGC CACGTCGTGA TAGTACAGCA GATACGCATA TGCTTTTTTG 
TCAGACCGAC CAACGCGTCC CCTGAGTTGG TAGAGCTGGG AAACCCCGTA CATATCAGCT 
CTATCTATGA TGATA6TATT TGCATTGGGA ACGTC6ATAC CATTTTCAAT AATGGTGGTA 
GAAAGCAGGA GCTGGAACGT TTTTTGATAA AACCTTTCAA AAATGTCTTC CAGTTCTTCT 
GACCCCATGA GACTGTGGGC AACGCATATG GATAGCTCAG GCACGAGTTT TTGGAGCATA 
CACTTTACGG ATTCTAAGTT TTCGATTCTG TTATGTAGGT AAAAAATCTG CCCCTCACGA 
TCTAGCTCTT TTCTGATTGC AGTGGCAACA AGGTTTGGAT CAAACTGCTG GATAACCGTT 
TCTATAGGTA GGCGGCCTTC AGGAGGGGTG GTGAGCAA6C TCATGTCTCT GATTTTGAGC 
ATACCCATGT GAAGCGTTCG GGGAATGGGC GTTGCACTGA GGGAGAGACA ATCTACATTA 
GTTTTCATCT GCTTTAATTT TTCTTTATCC TGCACACCGA AACX3TTGTTC CTCATC6AGG 
ATCATCAACC CAAGATCCTT GAAGGACACX3 TCCTTTTQGA TAAGCCGGTG GGTACCCACA 
ATAAGATCGA TATCTCCATG CGCGAGTTTG GCGAGTATGT CCTTTTGTTC AGATTTAGGA 
ACAAAGCGTG AGAGCTTCTC GATTCTGACG G6AAAGTGTT TAAACXX3ATT GCAGATTGTG 
CGAAAGTGTT GTTCCACTAG TAAGGTGGTA GGGGTGAGGA ACACCACTTG TTTTCCTCCC 
ATTACCGCCT TAAATGCCGC GCGCATTGCA ATCTCTGTTT TTCCGTATCC GACATCTCCG 
CACACCAGCC GATCCATGGG GACGGCTTCT TGCATATCCT GTTTGACTTC TTCAATGCAT 



10740 
10800 
10860 
10920 
10980 
11040 
11100 
11160 
11220 
11280 
11340 
11400 
11460 
11520 
11580 
11640 
11700 
11760 
11820 
11880 
11940 
12000 
12060 
12120 
12180 
12240 
12300 
12360 
12420 
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ATGCGCTGAT CGTCTGTTTC 
TCATCTTTTG GGAAGGCGTG 
TTTTGCGCGA TGTTTTCAAC 
GACCXAAGGC TATCTAAGTG 
TGTGCCTGCT CAATAGGGAT 
TCACGTTCTG ACTGTGCTGT 
TGCGCATGCA CCACGTAATC 
CGTGCGCGTT GCACTGATTG 
ACGATCAGTA TTTTGAGAGC 
GTGACGTCGC AACCTTTGAC 
AGACGAAAAC GTGCCATCCG 
TGTTACCGAA GAAGCTGCGT 
GCTGTGGAAA AAAGTGAGTG 
AATCGAGCAC TATGTGTTCT 
TTGCATTTTA TGGTAGAGGT 
GCGTTCGTAG TCAAGATAAA 
AACGCAGgTG GGACGTTCAA 
TCTGTGGGTG AGTTCTTCGA 
6AGATTTTGG TGGAGGAACG 
CAGTGTGCAC GCAGATACCT 
ACGTTCTATG GTGTTAAAAT 
CGCTGCGGCA GCGATATCGA 
GTGGTCGACA CGTGCATATC 
GTGCTCTCCC ACACGGAAGG 
GGTGAGCAGT GCACGCTGGG 
AGTGCGCACA nnCTnCTnAC 
TAGGGCAGCG ACCCCCACCA 
GGTGCAGACG GCGGCGACGT 
ACGTGCGCAC GTTATTCGCC 
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TTCGTAGGGG AATGCTGCTT CAAACGCATA CTGCCATTCG 12480 

GCCGCGCGTA GTTTTTCGCA GAGAGTAGAG TTCCACTAGT 12540 

AGATTTTTTG ACACGTGCTT TTCTCGTTTC CCATGACTTT 12600 

AGGTTTGTTC CCTTCATTTC CAATGTAACG TTGCACXyVGA 12660 

AAGGATCX3TT TCTTCCTGTG CATAGAGGAG GTTTACGTAA 12720 

TTTTATGCGC TCTATTCCCT TAAATAAACXT GATGCCGTAC 12780 

CCCGGGATTT AATTCCACAA ATGTGTCGAT AGGCGTGCTC 12840 

AGGAGTTTTT CTGCGGCGAC CGAAGATTTC GCCTTCTTGA 12900 

AGGAATGCTA AATCCTGCAG AAAGCGCGCA AGGTAGCACA 12960 

TAGTGCTCTG ATGCGmrTGC CTGCTGCTCA CTTTCTGCAA 13020 

TCTTTTGAAA GACGGAGTAG CTCTTCTTTG AAGTAAGGAA 13080 

GCAGGATCGC TTGCCAAGCA TATACTTTCG CACGCTGGCA 13140 

AAATACACCG TGTGCAGGT0 GAGCX3CGCAG ACAGCGGAAA 13200 

GGTTGAGGAT ACCAGCGCGC AGtACATGTT CX3TGCGCGAG 13260 

TCCGACACTC GTCTTGGAGC GCGCGTGCAC CGTTGTGCTG 13320 

AGACGCTTGG GGGTGAAGGG CTGTGGCGAA AATATTCGAG 13380 

AGCACAGTGG ATAGAACATT TCCTCCCCTT CATACGTTTT 13440 

TACACGGGAC GCAGTGGGCA GGACATTCGG ACAGTTTTTG 13500 

CTATACGCTC CTCACTCCAA AGAATTTCTT TTGCAGCGTA 13560 

CTTGCAGGAC GGCACACGTG GACACCGCCA GTATATGGAT 13620 

CACACACGAT TCGGTACGCT TGTGTGTTGT CAGCAGCCTG 13680 

GAATTTCTCC CCGGAGAGAA AACTCTGCGC AAgcGcTGaC 13740 

CCCATTGCAT AAGCTGGGCA GCGA6CGTGT GGATCTCGAT 13800 

AGCGTTTGA6 GGTACX3CACA TAATCGAGGG GAGGAACGGG 13860 

TGAAAACGaA CrcGCATGgC gGTaTGCATC GcGCTGTGCG 13920 

CCGGTGAGAG AACACGTGTG CGTTAGGTGA GACAGGGCGG 13980 

GGGGCAGCAC GCGCGTAGGA ACTGCTGCAT GTGCAAGGTC 14040 

CCTGTTTCGG TAnGGACTAC GAGCACTATG TGTGCGCAAC 14100 

AAAAT^GTAG GACCGCAGTC CAACGGTGCA CCCTTTCAAA 14160 
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CX3CQGTAGGG AAAAGCGTGC GGCACCAGCG AnnGCAGCAA TTGCTGGAAG CTCATTTCCA 14220 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2109 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

AATCGACGAA AGGTATTGTA GCAGAGAGAA TACTCAAGCC ATGCGTAAGG A6AAAAGTAA 60 

ATGGCAAGTT TAGATCTACC TAAGAGTCCC AATGTGTTTC ATCCCGAAAA GCCGAGTGCG 120 

GTTGGGTCAA GGAATTCACT GGCGCAGGAC TGTCGTGACC AGCAGCAGGA GGTGAACCAG 180 

CTAATAGAGG AAGAGACAAA CAAGATTCTG CACCACCTGA ACACTAAACT GCCGAAGaGG 240 

TTCTCGAGCG TCTGGACGTA ATGGGTGGGT TGAAGGAAAA GTTGTATAAC TACTTCAACC 300 

AGAATTACCA GAACATGTTC AACCGGTACA TGGTGACTGC GGAAGACGAA ATGCTGAAGA 360 

AGGTCCGTGG TTTCATCGAC CGAGAGGAAA TGAAGGTGTT GAACCGTTAC ACGCCGAAGG 420 

AGATTGCCAT CCTACTGGAT GAGGTTGCGG GAGCGGATAA GTTCAACACC GGAGAGATCG 480 

AGAAATCGAT GGTGAATATG TACGGGCACT TGCAGGGTCA TATACAGCGG GGTGTGAATG 540 

AGCTTGAGAC GCACACCAAT TCTTTGCTGC GTCAGAAGGT TGATGTGGGT GCTTTTGTCC 600 

GCGGAGAGAA TGCGTATGCG GTA6TCAAGT GTGCGTTCAA GGACAATCTT GCGCGTCCTA 660 

AGACCGTCAC TGACGTGAAG TTGTCTATCA ATATTCTGGA CTCAGAGTTA GTTAGCCCTA 720 

TCTTCCATTA CCAGACGACG GTAGCGTACC TTATTAAGGA TCTCATCTCC AATCACTACA 780 

TAGATGCCAT CGACAAAGAA ATTGATCGCG TGAAGGACGA GCTTATCGAC CAGGGTAAGG 840 

AAGAGATGTC TGATAGCAGT ATCATCTTCG AAAAGATGAA GATGGTGAGC GATTTCACCG 900 

ACGATGACTG CGAGAAmCCT GACAGCAAGC GCTACGAGCT TATTTCGCGG GAGTTGATGG 960 

AAAGAATCAG CAATTTGCGC GCGGAAATTG ATCCGGAAAC TTTCGACCAA TTGAATGTTC 1020 

GCGAGAATAT CAAAAAAATC GTTGACCTTG AGAACATAAG GAATC6TGGC TTTAACACGG 1080 

CTATCAATTC GATTACATCT ATCCTTGATA CGTCGAGGAT GGGGTACCAG TATATCGAGA 1140 

ACTTCAAGAA TGCGCGCGAg CTTATCCTTC GTGAGTATGA TGACACAGAT ATTTCGAATC 1200 

TTCCTGaTGA GCGTTACCAG TTGCGCTTAA AGTACCTCGA TAATGCTCAG TTGATTGAGG 1260 



AGAAATGGAG TATGCCACGC AACA 



14244 
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AGCGTAAGGG 


GTATGAGGTG 


ATGCTTCGTT 


CTTTTGAGAC 


GGAGGTGGAT 


CATCTATGGG 


1320 


ATGTGCTGCG 


TACTAAGTAC 


GATAAGTCTA 


AGGCGTCTAG 


GTTCATGGCG 


AAGATTACCG 


1380 


ACTTTGATGA 


CCTTGCTAAG 


GTGTACAAGA 


AGCATATAAA 


GAAGCATTAC 


AAGGATAAGA 


1440 


CTGGTGAGCC 


CGTGTACGAG 


GATATTGCGA 


AGGTATGGGA 


CGAGATTGCT 


TTTGTGAAGC 


1500 


CTGCTGAGAC 


CGAGGTGGAG 


CGGATGAATC 


GTACGTTTGT 


GTACGAGAAA 


GACAAGATGC 


1560 


GAAGGAAGCT 


TATTCTGATG 


CGTGGGAAGT 


TAAAGGGTAT 


GTATGATTAC 


CAGTATCCTA 


1620 


TTGAGCGTCG 


GGTTATGGAG 


GAGCGTCTCG 


CGTTCTTGGA 


ATCCGAGTTT 


AACCX5TTTCG 


1680 


ATTACTTGGT 


GAATCCTTTT 


CACTTGCAGC 


CGGGCTTACT 


GCTCGATATC 


GACATCACGT 


1740 


CTATAAAGCG 


CAAGAAGGCG 


ACGCTCGACG 


GTATGGCTAA 


CGTGCTTAAT 


GAGTTCTTGC 


1800 


ATGGTATCTC 


TAAAGGATTT 


GCGGACGCTG 


CCTTTGCTTC 


GTTTAGTCGT 


CGTCGTTCAA 


1860 


CQGTGCGTGC 


TGATATCGGT 


CAGAGTTTTG 


CTAGTGACGG 


CAgTGCCXSAC 


CAGAAGGAGT 


1920 


CCAGCGGTAG 


GGTGGCTTTT 


ATGGATATGG 


TAAATGAGAC 


TCCTGCGCTT 


GAGTCTTCCG 


1980 


TGGCCGCTGA 


GCAGGTGGAT 


GTGCGCTCGG 


ATGTTGGAAT 


GAAGACGAGA 


AAGGTGGCGC 


2040 


GGTGGATGCA 


GGCAAGGGTC 


GACGTGGTAG 


ACGGTCTGCC 


ATTCGCGAAt 


CTAGCGAGAT 


2100 


TGTAGATAC 












2109 



(2) INFORMATION FOR SBQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9848 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOIiOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

CTGGACATGT TTCCTGCCTC TGAACCTTGG GTGAGGGAGT TTGCACAGAG GGTGGGGATT 60 

CACGTGCAAG AAGGTGCACG GCTCGTGAAT TTGCCTCGTC ACCCTAGCCA AATCTATGAA 120 

GCTTTTTTTG AAGGAATTGT GCTGTGGTGT ATTTTGTGGT GTGCGCGTCG GGTAAAAACG 180 

TATAACGGCT TTTTGGTGTG TTTGTATGTG GTGGGGTACG GAGTGTTTCG TTTTTTTATT 240 

GAGTATTTTC GTCAGCCTGA TGCGCATTTG GGGTACAGGT TTTCCGCCAC GCAATCGTCT 300 

CCGATTTACC TTTTCCAGTC ATGGAGTGAT GTTTCCACCG GGCAGATTCT GTGTGTTCTA 360 

ATCATTCTCG CAGGTTTGGG TGGGAT6TTC GCACTTTCGG CGTATCACAA GCGGGATAGT 420 

GTGCGGAAAG CGCGTGTATG AAAATGAAAA GAATGCACCG ACTGGTCCAT CAGCCGAGAT 480 



Printed from Mimosa 02/03/22 07:20:47 Page: 198 



wo 98/59034 




GGGGTGGCGC GATGTACCTT GCGTATAATG CGCAAAGGTG 
TCCTAATGAG CTGGTCTTCT TCTQGTGTAC CAAGCACCGC 
GGTATGTGAT TTCTCGAAAG GGGTTCAGAT AGTCTTTGCA 
TTACTCTGTA CTTTTCGATG GTAATTGCGT GTTGCACGTA 
CTATCCTTGT TCGTGCAAGA GAACATAACA CTGCATAGGC 
CGTACGTGGC AGCAAGCGCG TCGTATTCTT TGTGGATGTG 
CCGTTTCAAT GTTTTGTAGA AGTGTCTCAA ACTTATCTTC 
TGTTTACCCA GTTGTGTGTG ATAGTTTTAG GATCGTGCGA 
GTGTTGTTTG CTCAAAGAAA GACAGGAGCG TTTTTGTGCA 
GGTATGCTTT GCGGGCTTCT AAGGGTTTTA GCACAAGTGT 
TGCCGGTGAG CAGTACGGGG ATTGTCTGTG CTTTGGTGGG 
CGGCAGTGAG CGCGCTGTTT CCCGCGTTAA CCCACGCGCG 
AAGCGAGTGC GTTTTCAATT TCTCCTATTG TGTCGGGAGC 
TTTGTGTTTT CGTTTTTCGT TTGTCTCGCG CAGAAtTTTT 
TACATGTTGT AGAGCCAGTA GTACX3CX3GGC ATGATTTCTA 
TTGGTGACAA GACTAAAAGG AAATGGAATA TCAAGTTCTG 
ATTAATACGA AAGAAGCAAA ACGACAATTA TGTTTGAGGG 
AACCCTCGTC CCGCAATAAT TTCTCCGTCA TTTTTGCGTG 
GTGGAGCCAG CAGCAATATT TGACTGACCA C6TATGAGTG 
TTGTGGTGTT GTTCATGGTA GGGAAAGATG AGTGCGTTAA 
GTGGAGTTAT CACCTAAAAC TGAGTGAATG AGTCTGGTAC 
TTTCCCAGTA CAAAGCGCAC CGCCTTTACC CCATAGAACA 
ACCCCX3TTCA CTAACTCTAC TCCTTCTCCT ATTTGCGTAG 
ACAGTAAGGT TTTTTAGTTT GTTTGCTCCT TTTACGTATG 
TCCTTGATGA TACGGCaGCT TTTGATAACG GATTGTGTTT 
CgGCGAGTGT CGTGTTGTTG TTGAGTCATT GACTCGAAGC 
TCTCGGTGGC ATGCCCACAA AAACGCGTCT GCTGCGATCA 
TTCCGTCCGC CTGTTTCGTT GAGAGGATCA ATGGTGATGC 
TCTTTTATAA TTCCTGCGCC GAATTTTGCG TGGTTAGTGG 
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AGCCGATTcT TCTTCCGTCG 540 

ATTGCGCTCC A<3AGGGTTTC 600 

TCTCGTTTTG TGAAtTTGTG 660 

GGTGCAAAGT AGCGGTGCGT 720 

GTGGAGTGCT TTATCCGTTT 780 

GTGCCAACTT TTGAACTTTC 840 

AGGTACGAGT TCTCCTCCXTA 900 

AGTAAAGTGG GAGATGCAGC 960 

GTACCATATG AGTATGTCTC 1020 

TGATCGGGTG GAGTGTTCTA 1080 

ATGTTTGAGG ACAATGTCTT 1140 

CTCTATCGCG CTGTCTAGTA 1200 

GAAGACAGAG ATTTCTACTG 1260 

TTTCGTTGCG TTCAATCGCA 1320 

ATCTGTTTTC CCGTTCATTG 1380 

CGGGGTAATT TTTCCCCGTG 1440 

TACTTGCAAG TCCTGACCAG 1500 

TATTGTGATT TGACCCAATG 1560 

CT6CGATGAG GAAAGAGTT6 1620 

GAACCTCACA ACACGAAATC 1680 

CATATTTAAG TGCXXrAgTTA 1740 

CACGGCAACC ATATCCX3ATC 1800 

GTTCCTGGAG AGAT6ACTGA 1860 

ATCCAGGACC GAAGcACACA 1920 

CaATCGTTCC gTAGTATCCg 1980 

GTTGCATGAG CAATGTTCTG 2040 

TCCCTACGAA GGGAAATATT 2100 

GGACTGCTTC TTGTTCTCCG 2160 

TACACAGCTC ATCAATCCTG 2220 



Printed from Mimosa 02/03/22 07:20:48 Page: 199 



wo 98/59034 P(^Mp8/]3041 

198 

CTGAGGATGA CATGATTTCC AATTATGTAG TGAGAAATAT ATGCGCAGTG ATGGATGGCG 2280 

CAATTTTCTC CGACGTCGCA CGAAaTGAGT GTACT6TGGG TAATACCGGT TGGTACGGTA 2340 

AAGTCX5TGAT ATCGCAGAAA GgCTCGCTCG AGCGwasCGA TGCGTACGAG CCCTGCAAAT 2400 

GATGAATTAC GTATGAGTGA CGCGTCGAAC GGATCTGCTA CTAAAACGTC GTGCCAGGTA 2460 

TCGCAGTGAT TGCCCTTTTG TATAAGGGTG TGAATTTCCT CCTTAGACAA TGGTCTCCAT 2520 

GCACGTGGGG GTTCGGCGCT CTGGGAAAAG CGGAGATAGT ACTCGTCTTT TCCGCGAGGG 2580 

ATATGGGTTT GTGTGATGAA GTGGTATCCA AAAGAGGGTA GGTCTAAAAT TTGCACACGC 2640 

ATTCTCCCTT TTGGATGCCC ACTATAGGTG GTGAATTTTT ACATGTAAAT AAGGAATTGG 2700 

GGTTGTGATG GGGATGGTGA TTTCCTGCAT GTTTACTTGA CATGACATAT TAGGAATGGC 2760 

TAGATTGGGG CCCaGTCTTG TTTTTTAGCX3 TGCATTAGAA GTGGATGTaC TGGGGAGGAT 2820 

CgTTGGCGGA TAACAAAAGC TTGCGGATTA ATGGAAGTAT TCGGGTACGA GAAGTGAGGT 2880 

TGGTTGACGC TGTAGGGCAG CAGTGTkGGG TGGTGCCCAC CCCTGAGGCG CTGAGAATGG 2940 

CACGGGATAT CAATCTTGAT TTAGTAGAGG TcgcTCCGCA GgCGAGTCCG CCGGTGTGCA 3000 

AGATCCTGGA CTATGGGAAG TATCGCTTTG AGATGGGCAA AAAGTTGCGT GACTCGAAAA 3060 

AGCGACAGAG ATTGCAGACG CTCAAGGAGG TGCGTATGCA ACCGAAGATC AACGACCATG 3120 

ACATGGCGTT TAAGGCCAAG CATATACAGC GGTTTCTCGA TGAAGGGGAT AAGGTGAAAG 3180 

TGACTATCCG CTTTCGTGGA AGGGAGCTTG CGCATACCGA TCTGGGTTTT AACGTGTTAC 3240 

AGAATGTGCT TGGCCGTCTG GTGTGTGGGT ATAGT6TTGA GAAGCAGGCA GCAATGGAAG 3300 

GTCGGTCTAT GTCCATGACG CTCACTCCGA AGTCAAAGAA AOXSATGGAGT GTCGGGTAAC 3360 

TGCAGTTCGT GTTGTTGGAT AAAGGGGAGA AAGTATATGG CTAAGATGAA AACGAAAAGC 3420 

GCAcAGCAAA OCGTTTTAGT GTAACCGGGG CTGGTAAGGT AAAGTTCAAA AAGATCAACC 3480 

TGCGTCACAT TTTGACGAAA AAGGCCCCGA AACGCAAAAG GAAATTACGT CATGCX3GGTT 3540 

TTCTGTCAAA AGTTGAGCTT AAAGTGGTGA AGCGGAAGCT GTTGCCTTAC GCGTAGgTGG 3600 

CAAGCGTGAG AGGACGGAG6 AGCGTGGTAT GTCTCGATCG TTGAGTAGTA ACGGCAGAGT 3660 

GCGCCGGAGA AAGAGGATTT TAAAGTTAGC CAAGGGCTTT CGGGGTAGGT GTGGCACGAA 3720 

TTACAAGGCG GCGAAGGATG CGGTCTCGAA GGCTCTTGCG CATAGCTATG TTGCGCGGAG 3780 

GGATAGGAAG GGGAGTATGC GCAGtTGTGG ATCAGTCGCA TCAATGCATC GGTTCGTACG 3840 

CAGGGtTGAG CTATTCTCGC TTTATGAATG GTCTCTTGCA GGCTGGGATT GCGCTTAATC 3900 

GCAAGGTTCT CTCCAATATG GCAATTGAGG ATCCAGGTGC GTTTCAGACG GTGATCGATG 3960 
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CTTCTAAGAA AGCTTTGGGG GGTGGAGCGT GCTAAACCTC GGTCAQGTAA AAGTGCTGGA 
GGAGAAGGTT GCGAAGgCGG TGCACCTTGT CCAAATGTTG AAGGAAGAAA ATGCX:GcgTT 
GCGGgCTGAA ATTGATGGAC GTGGTAAGCG TATTACGGAG CTGGAGCAGC TGGTGCTTGs 
CTTTCAGGAT GATCAGACGA AGATAGAGGA AGGAATTCTT AAGGCACTGA ACCACCTGAG 
TACATTTGAG GATTCTGcGT ATGGAGAAGC GCTTACGCAA CACGCGGCGA AGgTTCTAGA 
AAACCGGGAG CATGCX3GGGC TGTCTGAAGA ACTTACCAGC CGTACCXTAGA TGGAAATTTT 
TTAGTGGTCA GTGTAAAGGG GCAgTTGCAC ATCGATCTGT TGGGAGCGTC TTTTTCCATC 
CAGGCTGACG AGGACTCCTC GTATCTGCGT GTGTTGTATG AgCATTACAA GATGGTGGTG 
TTGCaGGTGG AGAAGACGTC aGGGGTCCGC GATCCcTTAA AGGTcGCGGT GATTGCgGGT 
GTGCTTCTCG CGGATGAACT GCATAAAGAG AAGAGGAGAC GTCTTGTACA GTCCGAGG/^ 
GATCTGCTGG AAATAGGGGA GTCTAnCCGA GCGTATGCTC GAATCCATCA GCAAAGTGGT 
GGACGAGGGG TTTGTGTGCG GGCGCGATTG AGGGTTGTGT CCTTCTTTGT GTACGGGACG 
TCCTGCGGTG ACGCTGTGGG TQGACGCGGA CTCATGCCCC GCGCgcGTCC GCGTACTTGT 
CGCGAGAGCG GCAGCGCGCC TGGGGTGTGT GGCTCGATTT GTGGCCAACC GTCCTATCCC 
TCTCGTGCAA AGCCCGCATT GTATCATGGT CX3AGACTCAA CCTGTTGACC AGGCTGCGGA 
CCGTCACATg CATCGCGTAT GCGCGAGCGG GTGATTTGGT CGTCACX3CGT GATATCGTGC 
TTGCAAAGGC AATTGTAGAC GCGCGCATCT CTGTTATCAA CGACCGGGGT GATGTGTATA 
CGGAGGAGAA CATACGCGAG CGACTCTCGG TGCGTAACTT CATGTACGAC tGCGAGGGCA 
GGGACTCGCC CCTGAAACAA CGTCACCGTT CGGCAGGAGG GATGCCGCAC GCTTCGCAGA 
CTCCCTAGAT AGGGAAACCG CGAAgcTCCT GCGGCTTGCC AGGCGGCX3GG AGGCGAAGAC 
AGGGGAGGAG CAGT6CGACT GGCCCTCCGC GCAAGGGAAA AGCCAAACCG GCCGCCGGTG 
ACCGCACGCA AGACACTAAG AGTCCAAGGC CGGGCGGGTG GACTCCTAGT 6TCTTCTACC 
GCTTCTGCGA GATGAACTTA AGCAAATCCA CCACACGGTT TGAGTAACCC CACTCGTTGT 
CATACCAGGA CACTACCTTG AAGAAGCGCT TCTCGTTCGG GAGGTTGTTC TGCAGCGTCG 
CCCTGCTGTC GTAGATGGAG GAGTACTGGT TGTGGATGAC GTCCGCGGAT ACAATATCCT 
CX3TCGCAATA CTGCAGGACA CCCCGCAGAT AGGACTCCGA CGCCTTCTTG AGCATCGCGT 
TGAGGTCCX5C AACGCTCGTC TCTTTTTCCG TGCGGAAGGT TAGATCCACC ACGGAACCGG 
TTGGTGTCGG GACACGGAAG GCCATCCCCG TCAACTTACC TCTCGTAGAC GGCAGCACTT 
CGCCTACCX3C TTTCGCAGCT CCAGTGGTGG AAGGGATAAT GTTAACCGCT GCAGCXX:GGC 



4020 
4080 
4140 
4200 
4260 
4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
4800 
4860 
4920 
4980 
5040 
5100 
5160 
5220 
5280 
5340 
5400 
5460 
5520 
5580 
5640 
5700 
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CTCCGCgCCA GTCCTTCAAA GAAACCCCAT CTACAGTTTT TTGCX3TTGCG GTATAGGAGT 5760 

GGATAGTCGT CATCAGTCCC GTTTCAATAC CX3ACTCCCTC TTTGAGAAAG ACGTGCACTA 5820 

CCGGCGCGAG ACAGTTGGTA GTGCAGCTCG CGTTGGAGAC GACCTTGTGC TCAGCAGGAT 5880 

CGAACTCATG CTCGTTCACC CCXTATTACAA TAGTCTTCAC CGGCTTAGAC GCATCCGAGC 5940 

TCTTAGCCGG AGCACTGATG ATGACTCGCT TTGCTCCTGC TTCAAGGTGA CCGTATGAAG 6000 

ACTCATTCGC GTAAATGCCG GTGGscTCAA TAACCACCTC AATACCAAGA TCCTTCCAGG 6060 

GaAGTTGGGA AGGCTTTAAG CCGCGACCXK: AGACACACTT GATCCGATGC CCGCCCACCT 6120 

CGAGGATATC CTCGGCAGGA GCACTGAGAC TAGAACCCAT TTTGCCCTGC ACGGAGTCAT 6180 

ACTTTAGCTG ATAGGCAAAG TAGCGCGCAT CGGTGGAAAG GTCTACAACT GCCGCCACGT 6240 

CGAACTCTTT CCCCAACAGC TTCTGtCCGC CATGGCCTGG AGTACGAGAC GCCCGATACG 6300 

CCCAAAACCA TTGATTGCAA CTCTCATTTG CCCAACCTCC TCTAAAAAGA GCACACATCC 6360 

CGCGCAACGC TATCTGAAAA AA6ATCGGCA CX3TCAATCCC TCTTTGCTGT AGGGCTCCCT 6420 

TOCATTTTTC TATGTGCCCA GATACCATGG CCTCGCCTTG GAAGGTCTGG CCTCTAGTGG 6480 

AAGATTATTA CCX3CGTGCTT GGTGTGTCGC ACCGTGCCTC GACCCCTGAA ATTAAGT6TG 6540 

CCTTCAGAAA GAAGGCAAAG GCGTTACATC CGGATCTCGT TTCCCATACT GCAGAACTTG 6600 

AGT6CGAGGC GGTAgCgCGC GAGCGCgCTC TTCGCCGTAT ACTCACCGCA TACGAGGTGC 6660 

TCTCTGATCC GGGGCGTCGC GCGAAATTTG ACCTCCTCTA CGCGCGTTTC TGCX3CACGTC 6720 

CTGCTCCAGC GGGCTTTGAC TACCGCGTGT AmCTGCGTGC GCAGGtACGC TCTGCGCX3AT 6780 

GGTGGAGCTT ATCTTGTTTG ATCTCTTTCA CGGTTTTGAG TGTGACGCTG TCCGCGCGTA 6840 

CTTGTCCCTC AAGTGTCGGC CAGAAGGGTT CAACCTCGCC ACTCACCTTA CACGAGAGGA 6900 

TTTTATCGAC TGTGGCTTTG TGCTCGCAGA GGAATTGCAT 6TACGGGGAG AGTGCTATGA 6960 

ATGCTTTACT TTGCTCCAGG ACATCGTTTT TGAAGAATT6 CGGTGCGCGT ATTTTCGTCA 7020 

TTTTTTTCCT GAAGTACTGA AGCTCGCTGA GCATATCGCG CTCGGTAcTG CGTCTGTGCG 7080 

TGGTCGCAAC GGTAAATCCT GCGTATACTG CGCGCGCGCC ATGCCTGCTT GCCTGCGCAA 7140 

GAAATTGTCA CCTTCTACGC GTGCTTAGTT GAGTATTACG AACGTACGGG AGACCGCAGc 7200 

GTGCGCGTGG CTATGCCCAG AAGATGGATT CTGTCAGGTG AATGTTTGAC TGCACCCTGG 7260 

CX3GAGGAGTA CCGTGGTCCT GGGGGACCTC CGAAGGcTGG AGGTCCCOCT GCAGCTAGTG 7320 

AACGGACAGA GGAGGGACGC TTGAGCAGGA AGGAAAGGAC CTCATGATCC GCATTAAAAC 7380 

ACCAGAACAA ATCGACGGTA TCCGTGCCTC TTGCAAGGCA TTGGCGCGCC TTTTCGACGT 7440 
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TCTTATTCCG CTTGTCAAAC CGGGCGTTCA AACCCAGGAG CTTGATGCGT TTTGCCAACG 
CTTCATCCGC TCAGTCGGTG GTGTTCCTGC CTGGTTCTCG GAAGGTTTTC CTGCCGCTGC 
TTGCATTTCA ATCAACGAAG AGGTCATCCA TGGTTTACCT TCAGCGCGTG TGATTCAGGA 
CGGGGATCTT GTTTCCCTTG ATGTTGGTAT CAACCTCAAT GGATACATTT CTGACGCGTG 
TCGTACTGTT CCTGTCGGTG GAGTTGCACA CGAGCGACTA GAACTTTTGC GTGTAACCAC 
TGAGTGCCTC CGTGCGGGCA TTAAAGCGTG CCGTGCCgGA gCnCGyGCGC GCtgTTTCTC 
GCGCTGTATA CGCTGTTGCA GCACGGCACC GCTTTGGCGT GGTGTACGAA TATTGCGGAC 
ATGGCGTGGG GCTTGCCGTG CATGAGGAGC CGAACATCCC CAATGTGCCT GGCTTGGAAG 
GGCCTAATCC ACGTTTTTTG CCCGGTATGG TAGTCGCGAT AGAACCCATG TTGACGCTTG 
GCACAGACGA GGTGCGCACC AGTGCAGATG GCTGGACGGT GGTAACGGCA GACGGATCGT 
GTGCCTGCCA TGTGGAGCAC ACTGTGGCAG TTTTTGCAGA CCACACGGAG GTTTTAACAG 
AACtACGGAA GTAGAGCGTA CCGGCTAGTC AGCTATCTTA AGTGTGCGCG GTGTGCTGAT 
AGTACATGCA GGGAGCAGTT TGTGCACGGT AGGCAGCGTG TAAGTGTACG TGGCGGGCAC 
AGGTGAAGAG GGGATAAACT C6TAACCATA TCGCTGTGTG CTGCTTTTAA CCCGGGCTGT 
GTCGGTAGGG GTTTGGGTAC GCGCAgGGAC GTGGAGGGAC TCATGAACAT ATTGTTTACC 
TCGTTTGTGT GTGGGGTACA TGCGGTATGC CGCAGTTTTT TTACAGCAGC GGCGTTGCTC 
GTTTTTATCT GCTGCTCTGG TCATCCAAGT TCTGCGCGTG TGCCCtCTGC AGACACGATA 
GCTCGGCGCG TTGCCGGAGA CAGTGGGAAC gCTGGGGGGC GGACATTACT TCCTGTGQGG 
GTTTCgCGTG AATCGGTGCA GCTGTTAGAA CGGCTGCAAA ACGCGAACCG TCAGGTAACT 
GCCGAAGTGC TGCCTTCAGT AGTGACGCTG GATGTGGTGG AGACCAGAAA GGTTCGGGTA 
CGTGATCCGT TTGGCGGTTT TCCX5TGGTTT TTCTTTCGTG GTCCTGAAGG TCCGGGTGCG 
GGGnCTGGCG GTGGTTCTGG AAACAAAGGG GAAGCTGAGG AACGGGAGTA CAAAACGGAG 
GGACTTGGTT CTGGAGTCAT TGTAAAGAAG ACAGGGAAGA CGCATTACGT GCTTACCAAC 
TATCACGTGG CGGGTAAGGC TAATGAGATA GAGATTAAAC TGCACGATGG CAGAATCGTA 
AAAGGTAAAC TTGTCGGTGG TGACCAGCGC AAGGACATCG CGCTGGTCTC CTTTGAGGAC 
GCAGACCCAA ATATCCGTGT TGCCGTCCTT GGTGACTCGG ATGCAGTACG GGTAGGAGAC 
ATTGTGTTCG CAGTTGGCTC TCCTCTTGGG TACACTTCCA CTGTAACGCA GGGGATTATC 
AGTGCGCTGG GTCGCTTTGG GGGACCGGGC AACAATATTA ATGATTTTAT TCAAACAGAT 
GCGGCCATAA ACCAGGGCAA TTCCGGGGGA CCAATGGTCA ATATTTATGG CGAAGTGATT 



7500 

7560 

7620 

7680 

7740 

7800 

7860 

7920 

7980 

8040 

8100 

8160 

8220 

8280 

8340 

8400 

8460 

8520 

8580 

8640 

8700 

8760 

8820 

8880 

8940 

9000 

9060 

9120 

9180 
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GGGATTAACG CGTGGATTGC CTCCTCAAGT GGGGGATCGC AAGGGATTGG TTTTTCAATT 
CCTATCAATA ATGTGAAGTC GGATATCGAA TCATTTATCC AGTACGGGCA GGTGAAGTAC 
GGGTGGTTAG GCGTGCAGCT GGTGGCAACG GATGCGGACA CCGTAGCATC GCTTGGTATT 
GCAAAGGGTA CAAAAGGGGT GCTTGCGGCG GAAATTTTCT TAGGTTCTCC TGCGCACAAG 
GGGGGACTGA AACCGGGCGA TTACTGTGTA AAACTGAACG GAAAAGAAGT AAAGGATGTA 
AATCAGTTTG TGCGGGATGT CGGCGCGCTG CGCATTGGGC AAACAGCAGT ATTCGATTTA 
ATTCGCGGTG GTGTGCCGAT GACGCTTTCG GTGCGCATTA CGGAGCGTGA TGAAAAAATA 
GTAAATGACT ACTCAAAGCT TTGGCCTGGG TTCATCCCAC TGCCGCTTAC GGAGGCCGTG 
CGTAAACGTT TGGATTTGAA AGCGTCGGTG CGTGGTGTGC TAGTTAGCAA CGCGCAGAGC 
AAAAGCCCTG nCGGCGCTGA TGGGATTGAA GTCGGCGGAC ATAGTAGTGG CGGTCAATGA 
TCAAAGAGTC TCGAGCGTGC GTGAGTTTTA CGCGGnGCTT GCACGTCAGA CGAGGGAAGG 
TGTGGnTT 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7415 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CAAAAGGAGA TGTCATTCTA TGTGGAGCAT GCCATTCGCG TATGCACCTG GGTTGTGCGC 
AGACACTTCC GCCTGATGTG CGCACCAGAT AGAGTATGTT GTGGAACGCA CTCGGTCTCA 
TCGGGGAACT ACTGTTGCGC TTGCTATAAA TTATGGGGGA AAAGATGAAA TTTTACGTGC 
GGTAAAAAAG GTTTTGTGCA GCACTTCGTG CCCGGATGGT GAGCTTCTCA CCGAAGAAGC 
TTTCGGCGCG TGCCTTGATG CGCCGCAGTT GCCGAGTGTC GACTTTCTCA TCAGAACAGG 
GGGTCAGCAA CGCATGAGTA ATTTTTTGCT TTGGCAAAGC GCGTACGcGG AGTTCTATTT 
TACCGATATC CTGTGGCCTG ACTTTCQGGT AGAAGACATG cTGCGCGCCC TGGATGAGTA 
TCGCCTGCGC ACGCGTACCT TTGGGGGTTT GGAATGAGCG CGGAAATAAA GAGGCTGTTA 
ATCTTTTTTT TCGGCGtTCC AACTATTCTT ATGTTGGTAT ATGCGGCACC GCATGcACAC 
TTCCTAGCGT TCCATTTGGT TATCTTCGGA TCAGTTATGG GTGCGGTATG GGAAATGCAT 
GCGATGGTGT CgcGcAGGAT GTGCACGTAC CCACTGGTTT TGTTGATCCC TTTCAGTCTT 



9240 
9300 
9360 
9420 
9480 
9540 
9600 
9660 
9720 
9780 
9840 
9848 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
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GTGCTTCCGC TTTTAGGATA TGCAGCGCTG 
CTTTTTATTG GAGCACTGGG CACGCTGCTC 
TCGTTTTCTG CTTCTTTTGA AAACGCCXTTT 
TTGTATCCAG GTATCTTTAG CCTTTTTTTT 
ATCGCaTTGG TAATTTTTTT yCTCATGGTT 
GGGACGCTCT GGGGAGTCAA CAACAGAGGG 
AGtTTTATgG AGGTTTTgCC GGTTCGGTAG 
GTTCGCgTGT GACGCTCTCT TTGGGGATGC 
CTGCCATTGT AGGCGATCTA GTCGAGTCGG 
CAGGATTTTT TACCCCCGGG CGGGGCGGAA 
ACTGGGGACT TTTTACATTG CATGTGAGTG 
CGTGTGGTAG TGCTGGGCAT TACTGGTTCT 
CGGTTTCCCG ATCGGTTCTT GCTGGTGGGC 
CGGGCGTTGG CGCGCGAGTT CTCTTTATCA 
CAAGAAGGTC GCGCACGCAT AAAGCGTCTG 
AACGGTATTG CCGGCGCTGC TGGTCTTTTT 
ACGCTCGCGT TAGCAAATAA A6AAAGTGTG 
GCACGCGAAA GTGGGGCAAC AATCGTTCCT 
CTTATTGCAG cGCACGGCGC GCATGCX3GTG 
CCATTTAGAA CCTTTTCAAA QGAGTGCTTA 
CATCCGACGT GGCGTATGGG GAAGAAGATT 
GCACTGGAAG TTATAGAAGC AGTGCAGTTT 
GTGGTGCaCC CTCAGAGCaT AGTGCATGCg 
GCGCAGCTTT CTGTCCCTGA TATGGCGTCG 
GCGCCTCtGC GTATCAAACT CCGCTTGATT 
CTCCGAGGGT AGATGACTTT CCGCTGTTGC 
GTGCGTATCC TATTGCCTTT AATGCAGCAA 
GAAACATTGG GTTTTTAGAT ATCGCACACG 
GCGCAATTCC CCAAACGTTT GAAgAAGTTA 




203 

TGGCAGCCTG CACGGGGCGC 
ATGAGTGTTT TTTTCACCGA 
GAGCGTATGG CCTCGGCACT 
TCGCTCATTA CGCGGTG6CG 
TTTACGTGCG ACTCTTGTGC 
ATAATTCCTG CAAtCCTAAA 
GTGCAGGgTG TTTTGGCTCA 
TCATGGGTGT TGGAGCCTTG 
TGATGAAACG TTCGGCTCAG 
TTATGGATAA CCTGGATTCG 
TTTTGGGATC GCTGCAGTAT 
ATTGGAGCTG CAGCACTCAA 
GCTTCAGGTC ACCGGCAGAC 
GATATCACTA TGACTGGCTC 
CTTTCTTCCT GTGAAGCAGA 
GCCTCTCTTG AGGTGCTCAA 
GTACTTGCAG CTTCTCTTTT 
GTAGATTCAG AGCATGCTGC 
GCGCAGGTAG TGCTCAcTGC 
GCGCATGTCA CGGTGGAAGA 
TCTGTTGATT CTGCAACACT 
TTTCGTATAC CGGTGGATCG 
CTGGTGcAAT 6TCATTCGGG 
CCGTTACTGT ATGCGTTGCT 
TTACATCGGG ACTGTCTTTG 
GTATGGGTTT TGATGTTGCA 
ATGAGGAGGC GGTGCGTGCG 
TGACTGCACA GGCGTTGCAA 
TGGCgTGCGA TAmGCGTGCg 



PC^^8/13041 

TGAATCTGTC 720 

ATTGGTGTAT 780 

GTTGCTTGTT 840 

TCATGCAGAG 900 

ATGGTTCTGT 960 

AAGAGTATgC 1020 

CTtGTATTTG 1080 

GTAGGACTGA 1140 

GTAAAGGATT 1200 

tTGCGCCGTC 1260 

GAGTGTGCGA 1320 

ACTTCTGCGT 1380 

CGAGTACGCG 1440 

ATGTTCTGAG 1500 

GGTGGTG6TA 1560 

GACGCGTTOT 1620 

GCATGCTGCG 1680 

TATTTTTCAA 1740 

GTCAGGTGGT 1800 

TGCGCTTCAA 1860 

TGCAAATAAG 1920 

GGTCaCGGTG 1980 

AGAAACGTAT 2040 

GTACCXTTGAT 2100 

CATTTTGAAC 2160 

CGGGCGCAGC 2220 

TTCTTGCAAA 2280 

GAAGATTGGC 2340 

CGGATGTGTG 2400 
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CgCGGACGTG CATTGCACAG AGGTGGAGAG 
GGTGCTTGGT ATTGTGGTGT TGTTTCATGA 
TCGAGTGGAG GTGCTCAGTT TTTCTGTCGG 
TGGAAAAACG GAATATCGCC TTTCGATGCT 
AGAGCAAGCG TTTCAAACGG CGCTTGATCA 
TTCACT6TAT GCaGTAGGAC CGCTCAAACG 
GAATGTGCTT ATGGCGGTAA TGGTATTGGC 
CACATTTGGA AACCGTATTT CACCGGTGTA 
ACGCCGCGTG GGACTTCAGG ACGGGGATAC 
CTATTTCAGT GATATTCAAA AAATTGTATC 
GATCGAACGG AGGGGGCAGC TTATGCACGT 
TGGCATGGGG AGGGTTGGTA TTTACCATTA 
ACACGGTGCT GCATCGCGGG CAGGTCTTGA 
ACGCCGTGTG CAACACcAGT ACAGCTCCTT 
GTCGTATTGA CTGTGCTGCG TTCAGGGAAG 
ACAGAAAACG GGGCAATAGA T6TTGGTATC 
GGAACTTCTT TTTTTGCAAG TGTCCGTGCG 
TTGACGGTGA AGGGTATTGG TATGCTCTTT 
GGCCCATTAA GGATTACGCA TGTGATAGGA 
TTTTTAACX3G GACTGTCACA ATTATGCX3AG 
ATTATGAATC TACTCCCCAT TCCGATCCTG 
GAATTGTTTA TGCAAAGAAG CATACACCCG 
TTTGCGTTTG TTGCATTGAT ATTTTTATGT 
CACTAGGAGT GAGTGATGCA GTTACGGTGT 
GAGACX3GTAA TTTCGCTTGA TGAGCACXTCG 
TTTTTAAGTT ACCAGTGTCC GGCATGTGGT 
TTTGTGTGGC ATCCGAAGAA TGTGCATTTG 
TGTTTGGCTT TTTGTGCCXX3 TATGCATATG 
CCCTTTGTCT TACGGGAGCA CCAGACACCC 



204 

AGAGGTGATT AAGATAATTA 
ACTGGGGCAT TTTGTCGCCG 
TATGGGGCCG GTCCTGTTTC 
TCCTCTTGGG GGGTATTGCG 
AAAACTTTCC CGTATTCCCG 
CATGGGTATT GCCTTTGCAG 
ATTGGTTAGT GCGCTTGGCT 
TGTATACGAT AGTTCTGATA 
AATcCTGCGC ATTGGTGACC 
ACAGCATGCG CAGCGTGCAT 
GACCATTACG CCTGATAGAG 
CGTACCGCTA GTTGTTGCGG 
ACCTGAAGAT AAAATTCTTG 
GCGCTGCTCA AGGAATTTCG 
AGGCGATATC ATACCATTGC 
GAATGGAAAG CTCACACCGT 
GGCATTGCAG AAACGTTGCG 
CX3GGGCCTGC AATTTCAGCA 
GATGTGGCCC AGCATGGTTT 
TTTGTGGCAC TCGTGTGCGT 
GACGGCGGTT TGATTTTATT 
CGTGTGTTGT ACTATCTGCA 
GCGTTTTGGA ACGACGTGAA 
GCGTGTGA6C GGGTGTTCGA 
GAATTTGTTG CGCGTATACA 
GCGCGTATTC GTGCCGAAAT 
CTTTTGGTTC CTGaGCGAGA 
AGCGACX3GAG ATAGTGCTGA 
GTGATTGGCT ACGCAGAACT 



P(|^^8/13041 

TTGGCGTTGT 2460 

CGCTTTGGTG 2520 

GAAAGAAATT 2580 

GTATGAAGGG 2640 

TTGAGCCCQG 2700 

GACCGCTGGC 2760 

CGCGTGTACA 2820 

ACTCGCCTGC 2880 

AGCCGATACG 2940 

TGCCATTTGT 3000 

ATGCGC ATAC 3060 

CGGTTGATGC 3120 

CAGTAGCAGG 3180 

AAAAAAGTCA 3240 

GTTAGTGCGC 3300 

OGTTATACCX3 3360 

TATGTGTGTA 3420 

GGCTATCTCA 3480 

TCAGGAGAGT 3540 

CTCTCTCTTT 3600 

CGCATGTGTT 3660 

GTTTGTAGGT 3720 

TTTTTTGTTT 3780 

TATTGAACAT 3840 

GCAGGGGGAT 3900 

AAAAACAGAA 3960 

GCGTTTGCGG 4020 

CTTTTGTGAA 4080 

TGCTGATCGT 4140 
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GTTGCAATAC 


TAGCATGGGA 


TTTGAACCCT 


GAAATTGTTG 


AAGCAGTAAA 


GTTTTTTGTG 


4200 


TTGGAAGGGG 


CACCGCATCT 


AGGAGACAAG 


AGAGTTTCGT 


GTTTTTTTGA 


ACGTTGTGTC 


4260 


GGGGACACCG 


GATCGCGCGT 


GATGGAGTTG 


CACGTGTACG 


GTATCAGAGA 


ACAACAAACG 


4320 


GCAATTATGC 


CGGTTCCCAT 


GAATGTGTAT 


GAACGCGTTG 


AGCGAGAGCr 


mGGTAAACAA 


4380 


GCGGAGTTGT 


TTGAGGCGCT 


GTATGTTGGG 


GCX5TATCTTT 


CATACAAGAA 


TGTTTTTACT 


4440 


GACGCGTAGC 


SCCGCACAGC 


GAGCAGCATC 


TGGTGTGCGT 


GGTGTATGGG 


GTGTTGTACG 


4500 


CTTGGGCTTT 


TTGCAGACAG 


TGTAGAGAAG 


CXSCGCAgcgA AGGATGTGTT 


TACTGAACCG 


4560 


GCGCGCTTTT 


ATCCCTCACA 


AAAATCAACG 


CTTGAATCTG 


CCCGGTCTGA 


TACATCTGAA 


4620 


TCTGAGAATG 


CATCTTCTTC 


CGTTCCTTCC 


CACAGTCAGC 


AGGAGTTGGC 


GCCAGACTCT 


4680 


GCCGCGCCTG 


CGCGTAACTC 


TGTGTTGTCC 


CCTGCTCCTC 


CTGAAAGGAG 


AGAGAAGCAG 


4740 


GGGACTGCGG 


TGCATGGGGC 


GGAAGTGACG 


CGGGCGGGAG 


CTGTCAGCCC 


GCGTTTTGTA 


4800 


GGGGGGCTGA 


CAAAAATACT 


GGCCX5CCTCT 


GACCATACAT 


TCTTCGCTGC 


AGGAAATGAT 


4860 


GGGTTTCTCA 


CCCAGTACAC 


GTATCCGGAT 


TATAAACCGG 


ATACGTGGCA 


GATCACCCCT 


4920 


GTTTCTATCA 


AACACTGTGC 


AGTGCATCCG 


GACCGCGCGC 


GTATTGCCGT 


ATATGAAACA 


4980 


GATGGACGCA 


attacx:accg 


AGTCAGTGTG 


TGGAATTGGC 


GCACGAAAGA 


AATACTTTTT 


5040 


GCAAAGCGTT 


TTACCGCATC 


GGTTGTGTCA 


CTCTCGTGGA 


TTGTGCAGGG 


AAGTTTTTTG 


5100 


AGT6TGGGAA CAGCATCGCG CGAAGgTGTG ACGGTGTTAG ATGGGAGTGG AAATACAGTT 


5160 


TCTCTATTTT 


CGGAAGAGCC 


TGGGGTGGTG 


TTGTTGACTG 


CGAGTGGACC 


GCGCCTTGTG 


5220 


CTCAGTTATG 


CAGAATCTGG 


ACGCCTCACG 


TACGTAGATT 


ACAGCAAAAA 


GACAACCGTC 


5280 


AAACGTCTTC 


TTACCGAAAA 


GAATCTCCTG 


TCTCCCATGT 


TAATACATAA 


CGGTGCACAT 


5340 


CTTGTCGGTT 


ATAGAGACCA 


ACGTGTGTAT 


GTCATCCAGT 


CTTCAAGTGG 


CGCGGTGCTC 


5400 


ACCGAGTACC 


CTGCACGGA6 


tGcATGtTTT 


GCGCATACAT 


TCAGCX3ATAG 


TCTTCCTGTG 


5460 


TGGATAGAGC 


CTGCTGAGTT 


GAAGTATCAC 


TGGCGTATAC 


GGAAAGcTGC 


GCAGCGTTCT 


5520 


GCT6ATTTTA 


TGCTTCCTGA 


CAATGCTCGC 


ATAACAAGTG 


CGTGCTCGGT 


TCGCACGCGG 


5580 


GTCATCGTAG 


GAACCGATCG 


CGGGATCCTC 


TATGAATTGC 


AGCAGGGAGA 


TGACAGGCGC 


5640 


GTAACTATCC 


GCGCACTCAA 


TGGCGAGCGT 


CAGATATACG 


CAAGCX5ATGT 


ACATGGTGCA 


5700 


GATGAGGGCG 


CGTATTTTTT 


AGCAGACGGA 


TCCCTATATC 


ACAGCATGGC 


GTCCGGGGGA 


5760 


CCGTATCGTG 


TTTTGGTGCG 


CGGAGTAAT^ 


GGAACTCGGT 


TTCTGCCTTA 


TCGTGATGGT 


5820 


TTTATTGTGT GGTCTGCAGG GAAAGAAACA GAGTTTCTTC ATTGTGCGCA AAAGACGAGT 


5880 
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CAACACAGGA TGATATATCG CGCGCGTTCC ACGGTAAGCG GCGTGTCGGT GTATGGGCGT 5940 

ATGTTGGTGA TTACTGAACC TTTCTCTGGA GTATCGGTGG TGGATATTGA GCGGGGGATA 6000 

CgAGTTTTTT TTCACAAAGC GATTGGTATG CAGGATTCGC TATTGATTAC TGATGACGTA 6060 

ATTGTAGCCA CTCAAAGCGG TTTGCAGCCA CTTGTCCTGC TGCATATGCG TACGGGGGAG 6120 

ACATATACGC AGCGGTGGGA GGCGATTTGC CTTGGCGTCC GCGCX5CATGA TACACAGCAT 6180 

GTATATTTTT TTTCGTTGGA TACGAATGCG GGCACGACTG ATTTGATCCA TTTCGTCTGC 6240 

AACTGCAGCA ACCCACAGAA AGTGTTGTGC GACGCATCCT CTCTTAtAAG GATGAGGATA 6300 

TAGATGCGCA TATGGTGATG CGGCGTTCAC TGTTGGTAAC TAATTTAGGA AAAGGGGCGC 63 60 

TTGTCGGACA TCGCGTGCAA CAGTCGCAGG TGTATCGTAT GTCCCGTGCG TATGCGTTAC 6420 

CAAAAGTTGC TGCAATCACG TCGAACGGAG TTGTCAGCGT GAATTACGAT GGTTCAGTTT 6480 

CGTGGTATGA AGGCGACGGT GCGACATTGA AAGCAACCGA ATTTATCCGG ACCGAAGATT 6540 

TTTGAACGGG TACACAAGGT GCGGTGTATT TTGTAATTCG GCACGGTGGT ATGAAT6CTT 6600 

CCTAGTTGGT CTTGACAGGG AGCTCCTTCT CGGGGGAGGA TGGGCGGGGT AGATGTTGGT 6660 

TCGCTACAGT TACGATGCAA AGGGAAGGCG GTTGGGGCGT GCGCTGGTGT ACACTGAGTC 6720 

GGAGCACGGT ATACCTCGGC AGAGCGTTGA CGCTGGGGCG ATAAGGgTTG TAGAAGCGCT 6780 

GGTGGGTGCG GGGTATGAAA CCTATATCGT CGgTGGGGCG gTAAGGGACc TGGTTGCGGG 6840 

AAGGACACCA AAAGATTTTG ACATTGTTAC AGGCGCAGTT CCCTCTAGGA TTCGTAGGTT 6900 

GTTCAGGAAC TCGCGCATTA TCGGCAGGCG CTTCCGCATT GTTCATGTGT CGTGTGGCTC 6960 

GCAGCTGTAC GAGGTTTCCA CCTTTCGCTC TCGTGTGGGG GAAGgTTCGG TGTGTGTTCC 7020 

TGGCACGTTG GAGGAAGATG CATGGCGGAG GGACTTTAGT GTCAATGCCT TGTACTATGA 7080 

TCCTCTGAGA AATGTGGTGA TCGATTGTGT CGGTGGAATG GTTGATCTGA AGAGGC6TCG 7140 

CGTGCGGCCG CTCATACCTC TGCGGTCCAT CTTTGTAGAG GACCCAGTGC GCATGCTCCG 7200 

GGCATTGAAG TGCTCGGTGA TGTGCGAGTC TTCCATCXTCT TTTTCTGTCC GCCGCAtATT 7260 

CGCCGCAtGT TTCCCTTCTt GGGGGGTGCT CTCCCTCCCG GTTGACCX3AC GAATTtGTAA 7320 

AAATCCTCTT TtCCGGTCGG AGCGcCsCGC TTGTGCGCGC CCTATGTGGG TAmCAGCTCC 7380 

TTCTGTACTT GCAGCCX3TCT GTGCACTACT TTATG 7415 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5271 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: doiible 
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(D) TOPOLOGY: linear 



(Xl) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



CTTTGATTGG 


TAAGGTAAGA 


G/^CCGTCAA 


GAAATAATCC 


TTTTAAAGTT 


TTTAGTTCAA 


60 


AAATAAAACG 


CGTTCCACTT 


ATTTCAATAG 


TTTTCGTTTC 


TCCTTCAGTT 


CTAGAATAGC 


120 


GAACGTTTCC 


TGTGGAATAC 


AGAATTTGGG 


CAGAGGTGTT 


GTAGACAGTT 


TGATCTGAAA 


180 


GAATCGTTGC 


GGTAACGcTC 


CCATCATCAA 


TGGAAATGGA 


AACATTCCCG 


GTAAAGACTA 


240 


CCAACTGATC 


TTGCAAATCA 


AAGGGGGACA 


GCGGCACGCG 


gCCGGTACTT 


TCCAATATAG 


300 


GTTGTCCAGT 


TTCAGAAAGG 


CGCGTAGTTT 


CCTGTGCCGA 


ATTAATAATA 


ATTCTTAATT 


360 


TGCGTAGACC 


ACTTTCACCA 


AAGAGGGGAC 


AAAAAAGAAA 


GAATATGCCC 


CAAATTGGAT 


420 


ACCATGCTCT 


CATGTCTTTT 


TACACAGCAC 


CTCTTTGATG 


TACCAACCGG 


TGCGAGACTC 


480 


AGTTATCTGG 


GATACTGCTT 


CAGGGCTTCC 


CTGTGCAACG 


ATAGTTCCAC 


CGTGCATTCC 


540 


TCCTTCAGGA 


CCTAAATCGA 


TAACACAGTC 


TGCCTGAACA 


ATAACATCCA 


TGTTATGTTC 


600 


GATCATCACA 


ACCGTATTTC 


CCTGATCTAC 


CAA6CGTTGA 


ACAACCTCCA 


TTAATTGGAT 


660 


GATATCGGCA 


AAATGCAATC 


CGGTAGTAGG 


TTCGTCAAAG 


ATATAGAGAG 


TTTTTCCTGT 


720 


CGCACGCTTT 


GAAAGCTCAA 


GTGCAAGTTT 


AACGCGCTGG 


GCTTCTCCCC 


CTGACAACGT 


780 


CAGAGCAGAC 


TGTCCTAAGC 


GCACATACCC 


AAGCCCCACC 


GAGcAGAGAG 


CTTCTAGCTT 


840 


TCGTACTATA 


GGrGGaACAG CAGAAAAAAA AgAACGsGCT TCTTCGATCG 


TCATGTCCAG 


900 


CACATGGGAA 


ATGTTCTTGC 


CCTTATAAAA 


CACAGCTAAT 


GTCTCCCGGT 


TAAACC6GGT 


960 


GCCGTGACAC 


ACATCACAGG 


TAATGTACAC 


ATCAGGTAAA 


AAATTCATTT 


CAATAGTGAT 


1020 


AACGCCATCA 


CCTTTACAAT 


GCTCACACCG 


TCCTCCAGGA 


ACATTGAAAG 


AAAAACGTCC 


1080 


TGGTTTATAT 


CCCCGCATTT 


TTGCTTCAGG 


AACcTGGGAG 


AACaGCATTC 


TAaTATCTGT 


1140 


AAACACACCC 


ACATAaGTTG 


CAGGATTTGA AcGAGGAGTT 


CTCCCGATAG 


GACTTTGGTC 


1200 


TACATAAATT 


ACTTTATCTA 


AATGCTCCGT 


CCCCTCAATC 


GAGGAAAATT 


TTCCTTcAGG 


1260 


AAGTCTGCCG 


TTCATCACAC 


GGTTGTATAA 


CGCAGGATAT 


AGCACATCAA 


TTAAAAGCGT 


1320 


TGATTTACCC 


GAGCCGGATA 


CTCCGGTAAT 


GCAGGTAAAA 


GTACCGAGTC 


GAATACGTAC 


1380 


AGAAATGTGT 


TGCAAGTTAT 


GTTCATGGAC 


GTCATGCACC 


GTAAGAACAT 


TTCCATTTCC 


1440 


CGTTCTCCGT 


ACTGCAGGAA 


TGGGTAATGT 


AATTGCACCG 


GCAAGATACT 


GACCAGTAAG 


1500 


ACTTGCTTGC 


ACCTGCATAA 


CTTCAGGTGG ACykCtGCGG CGACAACATA 


TCCTCCGTGA 


1560 
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ACACCXTGCAC CGGGGCCGAG ATCTACAATA TAATCTGCTA 
TGCTCTACCA CAAGCACTGT GTTTCCCAAA TCACGCAAAT 
CGTTCATTAT CCCX5CTGATG CAAACCAATA GACGGTTCGT 
GTAAGGCGCG AACCTATCTG GGTTGCCAGT CTAATTCGTT 
GTGGCAGCAG CCCGTTCCAA GGTGAGATAT CCAAGACCCA 
CGATCGGTAA TTTCTTTCAG GATCTGTTGC GCAATTGTCG 
AGAGTTTTAA AAAACTCACA CGAATCATCT ACAGACAGCg 
TTTTTTTCTA TAGTCACCGC AAGCGACTCT GGCTTTAAGC 
CATGTACGCA CCGATAAATA CCGTTCATAT ACCTCGCGCT 
GCGTATCTCC TGTGCAGCTC GCTAAAAATT CCCGGCCACG 
CGAGAGCCAT CTTTTCGTTC ATGGGAAAAC TCAAGAGCCT 
ATAATATCCA GTGCGTGTTT TGACAGATTG CGTACCGGAT 
TTTTCTGCGA GTGCAGCAAA CCGCACACGG TTCCACTCAT 
AAAGCACCCT CGTTAAAAGA ACGGTTTTGA TCAGGGACAA 
TGCATAATCC CCAGTCCTGC ACAGCTCGGA CAGGCACCAA 
AAGCGAGGCT GCAATTCGGG TACGGAGACA TTACAGTGCG 
AAAAATAACT CAGACGGCAG GAGAGCAGAT GTTTCTATCT 
TTCTCTCCCT GCACTAAGAC GGTCAACAGC CCATCTGCAT 
GATTCTGTTA ATCGTTTACG TACTGTATCT GACAATTGAA 
ATAGAATGCT TTTTTTGCTT ATCCAACGAA ATGCGCTCGT 
TCAATACGAG CTCGTACAAA ACCATCTTTG CGTGCAGCTT 
CCTTTTTTTC CTCGCACCAC CGGGGCAAGC AACTGAATTC 
ATGAGGGTAT CAACAATTtG GTCAACX3GTT TGTTCCTtGA 
CAATGCGCGC GTCCTATGCG GGCAAACAGC AGACGATAGT 
GTACCAACCG TTGAGCGAGG GTTACGCTGc GTAGTT TT T T 
GAAAGACCCT CGATAGA6TC AACATCCGGC TTATCTAACC 
TATGCAGAAA GGGACTCCAC ATACCX3ACGC TGTCCTTCTG 
AGCGAACTCT TGCCTGAACC AGAAAGACCG GAGATCACCA 
ATAACATCAA TATTCTTCAG ATTATGCTCA CGCGCACCCT 




18/13041 



CGCGGAGcgT TTgCTCATCG 1620 

GAAGAAGCGT TTGGATCAGT 1680 

CCAGTATGTA CAAAACCCCT 1740 

GTGCTTCTCC GCCGGATAAC 1800 

CX3TTCTGAAG AAACTCTAGG 1860 

CTTCTACTTC TGTCAGATGG 1920 

CACTGAGTGC GTGGATGTTT 1980 

GCATCCCTCG ACACGCTTCA 2040 

GTGAGTGAGT ACATGACTCT 2100 

GCTTAATGTA GCX3TGCGGTA 2160 

CGCTGCCACT TCCATGCAGG 2220 

CATCGAGAGA AAAATGGTAC 2280 

GCTCAGGTTT AAATGGCAAA 2340 

TGCGATCTAA ATCAAATGTC 2400 

AAGGTGCGTT AAAAGAGAAC 2460 

CGCACGCGTT TTTTTGCGAA 2520 

TTCCGGAAAC GGTCCCAGAA 2580 

AGCCTAGCGT CGTCTCTACT 2640 

TTCTATCGAC AACTATATCG 2700 

GTAAGTGGAG CAAAGCCCCG 2760 

CCAAGACCTT GTGGTGTGTA 2820 

TGCTTCCXX3A CGGCACGGTC 2880 

TCTCCCGCGC ACAGTGCGGA 2940 

AGTCATAAAT TTCTGTGACA 3000 

GCTCGATGGC AATCGCAGGA 3060 

GACCTAAAAA CTGGCGAGCG 3120 

CAAAAATAGT ATCAAACGCA 3180 

CAAGCGCATC TCGCGGCAAC 3240 

TTATACACAG ATTACGAGCA 3300 
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GCAGAGCCCA CGCGAACGTC CTGCGACACG TTCTCCCTTT CTACC3CGATC CCCATCCATA 3360 

A6AGGGCAGA CTATGCGCAA TTTTTATGCT TTATGCAATT CCACTCCCTT CGCGGCAGAC 3420 

CGCATTACAC CGTTCCTCCA AGCAACX3CGA CAATTTTTTC ACTCTGCGCA GACT6TACGA 3480 

TGAACATCAG CACACTTCCC TCAAGCAGAA CCGTATCTCC TGAAGGAATG AAAGAGCCAC 3540 

GCACCX3TTGA GATGAGCAGC ACCAAGAAGC TTCCGTGCAC AGCX3ATATCC TTCAGGCGCT 3600 

TGCCTACCAG GGGGGACTGC GCGGAAATAG CGAACTCAAC GATTTCTAGC GATCCGTCGC 3660 

CX5ATAGTATG TATGCCAGTG ACGTGGGAAC CX3GCX:AAATG GCTCATAATA GCGTCAACCA 3720 

cTACGTCTTG ATAAGAAACA GCAACATCGA TGCCAATTTT CCCCGCAATA TCCTCCATAA 3780 

GGGAGCTGTG TACCAATGCA ACAGCCCGAG GCACTCCGAG CGTCTTCATG TATGCGGCTG 3840 

TAATCATATT CAGCTCATAG TTATTAGTGG TGGTAATCAC CAGATCAAAC GTGTCCGGCG 3900 

TAATCTCTGC GAAAAAAGCC TCATCTGTGA CATCACCATG ATAGGCAGTA ACGTGCGGAA 3960 

ATTGAGCACA CACTGCCTGG GTTGcCtTTC ACTCTTATCA ACCAATACAA GACTCGCACG 4020 

CTCCCTTGGA GAAAGACTGA AGGCACTACT GAAAAAGTGC GGCTTGCATT TTTCTGCTAC 4080 

ATCCTGTGCC ACGAGCGTAC CTACCX3CGCT CATGCCAATG AGTGCAATTT TTTTTACCGG 4140 

ATGTATTTTA AAACCCGCCA GCTCATAAAA ACGTCCCATA TGTTCAGGCG CACAGAGTAC 4200 

TGACAGGCGC ATACCAGAAG CGAGCATGGT CTCCCCTGAG GGAATTACAC TCCTCCCCCG 4260 

AACTTCAAAA GCAACGGCAA CAAAAGAAAT TTTTACXy^GA CGACGCATAT CAGAGAGCGT 4320 

GATACCATCG AGGCCGCTGC CCTTTGCAAT AGGAAAACGG GCAATTTCAT ACGGTGCATT 4380 

TTTCAATGGG ATGACATCGC TGATGGCACC CTGCTCGACG GTGCTCACTA CCGCACGCAT 4440 

CX3CTTCCTTA TCCgCAGATA TGAGAAAGTC AATACCAAAA ATACAGCGCG ACTCACGACA 4500 

CACCGCGTGA GcGTAGTGGT CATCGTGCGT TTGGGCTATT TTAATCACTC CGGCATTCAA 4560 

GTCGGCGGCT ATACCACACA GTACTATGTT AAGTTCGTCA ACCTCGGTGA CCGCAACAAA 4620 

CGCCTGTGCC TTTGCGATAC CTGCyPCACC CAGGGTAGCg CGGTGAtCTT TTTGATGACG 4680 

CaCGAGCGCT TGTGCCCCCG CGGGAATTTT ACGG6GGACA GGCTCATGCG CGGCAGCAAC 4740 

AAGCGTAACC TGATGTCCCC TCGCGCTCAA ACGACGCGTA AGTTCACGCC CATTCGTACC 4800 

GCATCCAACA ACAATAACCC TCATGTCCGA AGGCCATAGT AGCACGAAAT TTTTTTGCAT 4860 

GGCCAGCGCG cAGAACaCGg CGcACAACGC CTGCCACTCA TATCTTTTTC AAAAGTACCA 4920 

CTACCTGTGC GGTAACCGCC GCACCAGATC CAACAGGTCC GAGGCGTTCG GCAGTCTTTG 4980 

CCTTAACAAA AACACGTGTT ACGTGCGTGT CCAGGGCCTG CGycAaGcGA TGCGCGCATC 5040 
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GCTTCCCGAA ATGGGTGTAA TGCAGGCTGC TCAAGACAGA CAACAGCATC GAGATTCACX: 5100 

AGCCGCCAGn CACTGCGCGC ACCAGTTGCC AGGTATGGCG GAGCAACGCG CAAGAATGT6 5160 

CGTCTTTCCA TCGTCCGTCA CAAGAGGGGA AAAACGTGCC AATATCCCCC AGGCCCTTTC 5220 

GCCAGCTGGC GTAATA6CGA A6AGGCCCGC ACCGATCGCC CTTCCCAACA G 5271 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 646 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : doxible 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

AAGTCCTTCT CGTAGCGCTT ACTGTCGGTG AAGCGGGAGT GGAGGACTTG GCCCGGCAGA 60 

TGCAGCAGCC CTCTATGCTC CGGTACATCT TTGGCTTGTG GAAACCAAAC ATCTTCGCTT 120 

CTQGCCAGGT CTGGAAAGAC AAGGCAACAG ACACAGCTCT GCCGAGGCCC TCACAAGAAA 180 

CCATCCGGGG AGCCTGTGCT GCTCCACCCC GGGCCCACCC AGCGCCGTCA GACAGCAGGA 240 

CTCTGTGGAA GGTGGCCTQG ACCCGCCTCC GCTCCTGGCG CTGGCACGGC AAGTGTATGA 300 

CACACAGAA6 AGCTCAGGTG TTCAGGGAGG CCCCCCGCTC TCAGCACTCC CCCACCCCTG 360 

CCCA6CAAAC ATCCTTTCTG AAAATGAGGA AGGGGAGGCT GGTTGGTTTG TTGGCAGGGA 420 

GCCAAGCACT TGAGCCATCA TCTGCTGCCT CCCAGGGTCC ACGTGAGAAG GAAGCTGGAA 480 

TCGGGAGTGG ATCAAGGATT GGAACCCAGG CACTTGCGTA CA6GATATGC TACAAGCTCT 540 

CCTGATAATC CTGTAAAATG ATGAAATCAT TTAGGATGTA TCCTGAAATC TGAGACaAGG 600 

CATACCTTTT CTTCTTGCAT CTTTGAAAGT. GaACCCCCCC CCACGC 646 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28295 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GTTCCCTAAA GAAGGGAATG TTTTCTCCtG TGTGTGCAgT CAATGTGCCG GCGTATATyC 60 
CGTTAGAATG GGTGCGCTCC AGTGTC/^GG CGATATCCTG ATAGCGCATT CTGAGCGCTC 120 
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GGATATTTTT TTTAAACACA TTAGTCATAT GCTGAGCTTG TTCCTAAGTT GAACGATTTC 
ATGGCGCACC TTGTGAATGA GAGTTTGTGT TCCCTGAGAA GGGTTTTTTC TCAGGTGAAG 
TAGCTCAGCG TATGCAAGAA ACTCAAAAGC CTCTTTTTCT ATGCCGGAGC TGGCTATACT 
GGATGCGATA CGTGCATGGT GTGsACTGCG TGTGATGCGT TCAAAGTAGT GGCGGAgCAC 
TGCCTTTTTG TGGATGCGCG AAATCGGCGG CCCTTCTGAA AAGGTGAATG TATCCTTTTT 
GGCGAAGGCG TGTTGGGAGA AATCTTCATA CGTGATGGTA CGTACATTGG AAGGTGGTGC 
GT6TTCTGTA TTTAGCCGTG TAAGTTTCTG GATTTtCTCT TGTGGGAGAC TGTAAAACCA 
GTGTGCGTAT GTTTCTAGTG AGCGATTGTC AAAATTTCCT GCAATGCAGt CGCAAGTGGC 
TCAGTTCTCT TGGGATTAAA ACGTTTGAAG GAAGCATGTG GACGTGCGTG TTGGAATCCC 
TTTCCCGGGT GCAAATCGAG TCCGGCCAAA TAAATAGAAT GTACCCCACA CTGTAAAAGA 
TATTCGAGTG CAGTCCCCAT GACACTCCCA TGTCTTTGTG CAGAAACCGA AGGaTGTGCA 
GGTGTTGCAG AAAGAAGTGT TCTGTATGCG AATGGTAGTT GAGCAAAGAA ATTGGAGAGT 
ATTTAAACAC ACGTGGAGGA ATGCACGCTT CTAAGGGAAA CAACACCGGA AGGGTAGGCG 
CTG6CGGAAA ATGCTCTGCG GCCCAAAAAC TGCCGTCCGT GCTCATGCAG ATATCAGGAG 
AAATATTGCG GTAGAGAAGT GCCTGCAGTG CTGAGGAGAC CGCAACTATG GGAAGTCCGA 
CTCGGGTGAT TTGCTCGAGT CCGAAACCTG cAGCCACACA CAACACTTCC GGTGCGGTCA 
GGCACACCgT ACCGGGcGCT CAAGAAAAAA TACATTTCTC AGTGTATTCA GTAACCAACXS 
TTTTCCGAAG TATATGCGCG TGGCGATTTC ACTCTGAATT ACTTTGATGG TATAgGTGAT 
TTCTTGCCAT GTGGACTGAG CTTGTGCGCG CCATATTCGG TTTGCTGGCT CCCAGGGAAC 
AAACGCGGTT TGCCCGAGCA ATTCGTCAGG GATATGATTA ATTAAGAAAG AATGCAAACT 
GCCGCAGTCT GGTCTCCAGA CTGCGTCCCA CTGGATATCG GATGAAGTGA ACGCATCGTT 
TGTGTATCGA ACTGCGATAA GCTTTGCATG TGGAAAGCGT GCCCX3CAAAA ACTCCGCGGT 
GTACGACTCA CXTCGGCTCTG TTATTACCAC AATGCGGCGA CCTTCTAGGA TCGCAGCGAC 
AAAACGCTCC GCCTCCCTTC TTGGGTTATA TTTTGAGTGG AGGTTCAGCG ACACGTCTCC 
CCGGTTTCGT ACTGCAAGAA ACGGAGGACG TCCGTCTGCG CGCGAGTAGT TCCACTGCTT 
GTGATGAGAA GTGAATTCTC CGCGTGGTTA ACAAAGTACG CGCTGACGCC ACGGGTAAGT 
TGGTTTAGGT ACAGCTCAGG ATCGACGGGA AAGGAAGAAA AGACCACAAC GCGCActTCC 
CX3GGAAGCGA CGCATGAGGC AAGGTTGTCG TTCCCTACGA GCGTTAATAG CTGGTTGCTC 
AGCGCACAGC AGCGGCGTAC CATACTCAGC TCTTCCTGAA GAACGGCGGG GGGCAACX3CA 



180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
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CTcTGGTTGA TGCTGATAGT GAACAGGTTT GCCATCCGCT GATTGTTCAG GGCGAACTCG 1920 

AGCTGGGAGC AGCTCGGTGC CAACTCCGGC TGTcGCGGAT ACACX;GGCCT ACTCGCCTCG 1980 

ATAAACGAGA AGCGCGCCTG TACGGCaAGG ATAAAGGAAC GCACGCGCTC CTCGCCTGAT 2040 

GCaTCACCGT GAAGGTCTGC aGCGAGGGCA CCTTCCTCGA CGTCGGTTCG GGTATCAAAG 2100 

AGCACTAGGT ATACGCGCAC GCTTCCAGCX3 GAAAACGGAT ACAAGTGGAG CGCACGCAGC 2160 

GTGCTGAATA GATGGGAAGA GAAGTAACCC TGCAATTCGG TAAGTTCCTG CCTGAACAGG 2220 

GTCGTCCACT GCTCCGCAGg CAGGCCGGGC GAAAGGAGGG AGGTGGGAAC ACGCATACGG 2280 

TAAAACGTGG TGGTGTCAAT CCCGAAGGGa AACGCAGgTC AAACATGCGC TCTTCAGTGA 2340 

GAAAGAGAAA ACCCGCGCGG GGAATTCCGA GAGATTCAAC CAGCTGTACG AACTGCAGCT 2400 

CCAGGCGGTA GTACTCGCAG GAGACACCCG AGCTACGCCG GATGCGGGAA GCGCGCGCGA 24 60 

TAAGACCCAT TAGGAAATCC CTAGCTcGTC T^AAGAGGTGT TTGTACGTGT GGAAATACTC 2520 

CGATCGCGCA AACTCCTCGA TCTTCTCCTC GGGCAGACTT TCGAGCAGCC GGTCCATATA 2580 

CCCGAGCACA GAGCGCACCT CGTCGGCAAG GCTCTTTGAG AGGGGAGGTG CAAGGTTGGC 2640 

AGAATCCCCC TGCGCAGGAC GGGCCGCTTC CGATTGGGGC GGGGACTCGA TTGCCACCGG 2700 

CGTGGCAACC TCCGAAGACT CAGAGGCGGC GGTTTCAATC GCGTGGGGAG CAGAAAACTC 2760 

TAGCGAGGGA ACGGAGACGT CCAACTCCTG CTCTTCTGGC AGTGGGAACG ATGTATCTTC 2820 

GCGCTGGGTG TCTTCGTCGT CGAAGACATT TTCAGTATTG AGGGGCTGCT CGGCACCX3GC 2880 

TGCATCGAGG CCGGTGGAGG AAACGTGCGC CCGTTGAGCC TCAGGCGATG CGTCATCTTT 2940 

GGCAACAGGA GACGGATCGG CCGCAGCGTC ACGATCCGCG TGAGCCCCCT GCTCTTTCAC 3000 

ATCGAAAAnT TCGTTTCCAA GAGTAGCACG CCCCTGTTGT ACCCACTGCT CCtGcgCGAC 3060 

GCgCGCTGcA GCGTGGTCGT CCTCCTCTAG GTAGCTGAAA TCGCCTGCAG CGGACCCGAT 3120 

GTGAGAGGAC TCAAAACCAC TGAGTTCACC ACCAAATGCG TCGGCCGCAG AGTCTTTCCC 3180 

ATCTTCCTCG GTAAACTCAG AGGTGATGAG GATATTGTTC AGCTCATCGT TGGTAAGCGC 3240 

AATCGTCTCG TCTGGATCAT CGTCACAGAA AAACCCGGAG TAGGCAGCCT CTGTTCCTTC 3300 

CGGAGCCX3GA GCGGTOGCGG GA6CCTGAGC AGGCTCGGCG CCAGCTTGCC GGGAGAAGGT 3360 

TCCCTTCAGC TGGTCTAGAT CAGCGCGGAT ACTGCTGATT TcCTGTGCAA TTTTCAGAAG 3420 

CAAATCGGTG GAAGCATCGC CTGaGGTACT GGCTGCGCCC CTGCGCGGCC ACCTGGGCAT 3480 

CACCCATGAG GGACTGCXSCa ATGCGTCCAC GTCACTGAAT GGGGCGCCGC GTACCGGTCC 3540 

TTCCGAAGAA GGCTCCGAAG CaGaGTAArT GATAGGGTCG GAATCGACCG GGTATTCCGG 3600 
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TGCGTCGTGG CTCAAACTCG AGAGCAAGTC 
GGGGCAGGGC ATGCTGCACC CGCGCGGCTC 
CGCTGCCGAT AGATCAACTC TGTCGTCCTG 
GAACGACGGC ACCTCCGAGT AAGCGGCATC 
GAGGACTCGC CACCCGGAGC GTGGCTGCAC 
AGGTCAACAA AGCCACGAGA ATCCTCTGCA 
TGGTGCCGCA CCGCATCCAC GCGCGCACCC 
TCCTCTGGGC CACCTGAGCT GACGGGTGGC 
ACGCCACACG CATCCCAGGC ACGGTCTTCC 
GGACTGTCCA TTTCTGCTGA CTCTGATCCC 
TCACACATCC TGTGTCGGCA GCCGTGCCGC 
TTTTCAACAA TTCGTAAAAG CACCCCGTTC 
GCGGAAGGCA CCTGCAGTGC CTGTGTCATT 
ACAGGGGTGA AGATGTGGAA GACAGTGCAG 
TATGCGCATT TACTTGAGGG TAGTACTTCC 
ACTCGCCTTT TTCTGGGGAG AGCGGQGGGT 
AAAGGAGCTC GTCCATCACA TCCAGACGCT 
GGTGGACX3CT CTATCCTTTG ACGAAGAGAC 
TGTCCGCGCG GGGGATGTGT TAGTGAGGCX: 
CCTgATTCTG GGGATGCACG TCCGCTtGTT 
CAAGGTGTAC GCGCTGTkCs TcGGCTTcTT 
CGCGCGTGCG TATTTTAAAA CATGAGGCGC 
CCGGTGCGCT CGTAGCGTTG CCGACAGATA 
ACGCTGTTCC GGATCTCATA TGTCTGAAGG 
GGAGAGAGGG CTATCCGTTC ATTGCACTGC 
CCGGGACGCG GCTTCTGCGG AGTTTCGTGC 
CX3CATGCAAG ACGGCGCX3AC GCAGGcgTTC 
TGATACGGGC AGTTGGGGGA GCGATCTTTT 
CGCTGCAAGA TGCACAGGAC ATCGACCACA 
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GTCGAACTCT GTGGTGTCGC ACGACCCGCT 3660 

TGGCX5GCTCT TCGGTCACAA CGGAGCTCAT 3720 

AGACTGCGCG GTGCGTGTGA CCAAATCGTC 3780 

TCCGGtGGCG CATCTGCCGC CTGGTCGGAG 3840 

CCGCGAGCGA GCACCTCGTC CTCCACGGAG 3900 

GAGGACTGCA CAGGAAGCGC ACGGTACACA 3960 

GGGGGAGCGT GTTGGGCAGG CX3CCTCTGGA 4020 

TCCTGCGCTT CCGGCGGTAC CTTTACCCAC 4080 

CCCTGCTCCT GAGGGCGTGC AGTTGCGTGT 4140 

ATAGCATCTT CCTCTGTGCC GTGAACTCCC 4200 

AAAGCATGAG CATAGCGTGT GCGGTCCGTT 4260 

TTACGCCGTA AATGACGGCA ACGCGCGGGT 4320 

ACCrcCTGGC GGGGGGGGGT ACGGACAAAA 4380 

ACACGCAGTC AGAGGAAGCG GTGAGGACTC 4440 

CCTGTCTCTT GCGCTGAACA GCTACGGTGT 4500 

GTGTGCCATG CGGCTACTGG AACGTGAGAA 4560 

CGCAGAGCGT GGGCGCGACT TGGCTGCGGT 4620 

TATCGGTGCG TATGCX3CX3TC AGCTAGGATA 4680 

GGTAAACTTT ACCnTTGCGC ACATGCATAC 4740 

GCACCTGCAT GTTTTAGCGA CACGCGTTGA 4800 

TgTCGTCCTG TTACAGCTGC TGTGGGGTAG 4860 

AGTCTGCaCXS CTnTGCgCgfT GCGCTCAAGG 4920 

CGGTGTACGG TTTCTCTGGC CTTGTGCCAC 4980 

CGCGT6GGTG CACAGA6ACG GAAGGGAACC 5040 

TTGCAGATCC ACAGGACX3TG GTTGTCTATA 5100 

GCTGTGGCCT GGCCCGTATA CGTTTGTntG 5160 

CGCTGTCCTG CTGACCTGTG cTGCGCTCAG 5220 

CCACGAGTGC AAATCGGCAC GGCGAGCCGC 5280 

TCTTTGGAAA GCATCTTGCG CTGACCGTAG 5340 
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ACGCAGGACC ACTGACCGGC TCCCCAAGCG CGGTGATAGA CCTCACGCAC CCCGTGcCX^C 5400 



GTGTGctCCG CGCTGGTGCG GCXXrCGTTGC CTCTTGCAGG ACTGGAAAGG CGTGACTCTC 5460 
CTTCCCTCCC TCATGTGGGG GAAGTATGTA AAGAATGAGG TCAGTTGCGG ACCCACTCCT • 5520 

CAGGCACCTC TTCATACTCA ACAGAAGCGG TGTGCCCTGC GCCCCGCACX5 ACAGGGGCGG 5580 

TTCTGACGCC TGAAAGCCGG TTGTTTCCTG AGATCTTTGG GAACCTCCAC TGGAGCGGAC 5640 

GCGCCTCcTG CGCGAGCTTC TGAGCGATCT CAGGGGGAAA ATGAGAAAAC TGTCGCGCAT 5700 

TGACTGCCCC CTTCTGTGAC ACCACGAGCA CTCCX5TCTGA AATTTTCAGC TTGAGGGGAA 57 60 

CX^GAAAGTCC GGAGCGCAAC ACcGCGATGC CGCGGCCTTC GCTCACGATG ACGATTCTTT 5820 

CCCCTTCCTC TGTGCTGTGC CAGGAACCGG CGAGGGCATC TAGGGACGAA ACGGATTCTT 5880 

CACCTGCGCG CGTGTGCCGC ATGCTAGACG TCTCTGTCTG ATTTCCTGTC AGCGGGACAG 5940 

AACGGTCAAA CACATCTCGG ACCAGGTGGC GCGAGTCAAG CAGAATGCGC GCTGCCGTCT 6000 

CATATGTTTT GGAGAGCAGC CGCGTGGCGT TGTGATCTTT AmCCTTGAGT GCAACAGCCA 6060 

GTCTAATCCC CTCAGGGGTG ArGTCCATCG C6CCGCAAAA TATGTAGTCA AGATTCCCTT 6120 

TTTCGGGAAA ACGGTGCX3GC ACCGCTTGCT CTCTGCAATC TACAAinGTGA TACCCACGCA 6180 

ACTCCCGAAT GAAAgAAAAG AGCGCGTCGT TGATGGTAGT TTCTGTGTGC GCAGGCACAC 6240 

CAgACACTTc TAGCCTGTAG ACGCCAACAC GAGGAGCTGC GTGAACAGCA T6CGCGAACA 6300 

GCACGAACAG GATAAGCAGC CGAGAAGAAC GGACACCTTT TTTCATGAGA CTAGTGGTGT 6360 

CGCTCACAGA GGCTGCGGGA CAGCTCCCGT GCGTTGTCGC GAGCTTTGAT CTGCGCGCGC 6420 

TTGTCAAAAA GCTTCTTGCC CTTGCAGATT CCCAGCGCTA CCTTCACCCG CCCTGCTTTT 6460 

AGGTAAAACT CCAGGGGGAC CAGAGTATAG CCTTTCTCTT CAACCTTGCG CTTCAAGCGC 6540 

GCAATCTQGT CCCX3ATGTGC CAGTAACTTC CGCATCCGAT CCGGATTGGG GGCAAAGGAG 6600 

CAAGCATGCA CX5TACTCCGC AATAT6CACA TTCTTTAGCC ACAGCTCGCC TCCGCGCATC 6660 

TCTGcAAATG CGTCAGGAAA AGAAAGATGC CCCGCGCGCA CAGACTTCAC CTCCGTGCCT 6720 

TCAAGCGC6A TGCCACACTC TAGACX5GTCT TCCACATGGT AATTGAAAAA AGCcTTGCGG 6780 

TTCTTTGCAA TGAGATQGGT TCCTGTGCCC CTCATGGCGC CGGATGCTAC CGGATAGGCA 6840 

CTTCCCTTGT CAATTCGATT ATCGCCGTGT TAGGCTGCCG TGTCTGGGAG GGACGCCX3TT 6900 

TTATGTTTGC GCGGTGGAGA AGGTACTCAT ATTTGGCGCG GCGCGAAGCA CGGCX3GAATG 6960 

CGACCGCAGT TTGTAGTgCT GGGGTGGGCT TCTTTCTGTT CTATCTTTTT ATCACTACGC 7020 

ATGTGGTTGC AGCX3TATCGC ATTCAGgCGG ACTCGATGCA GCCGACCCTG AGCGCAGGGG 7080 
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ATTGCX3TTCT TGCCTCGTCC CTGTTTCGCT TTGCCCGCAT CAAGCGGGGG GATTTGGTGC 7140 

TTGCAACTCC CCTTGAGAAA GAGGATATAG GCCTGTTTAA AAGGGCGATG AATGCTGTGT 7200 

TnAGgnTTCG CAAGCCTTCA ATTGTACCGG CCGTTTGGCG CGGCAGATCG CATGTTTTCG 7260 

CGGCCGCAAA TGCGCAGGGT GGTGGGCCTT CCAGGGGACA CTGTCTATAT GCGCGATTTT 7320 

GTGCTGTACG TTAAGCXXTCA CGGTCAGCAA CACTTCCTCA CGGAATTTGA AGTGAGTGCA 7380 

6TTAGCTACG ACGTGCGTAA GGGGGTGCTT CCTGAGCATT GGTCTGAACG GCTTCCCTTT 7440 

TCTGGTTTCA TGGAAGAGAT GCAGTTGGAC GAGCACTCCT ATTCGTGCTG TGCGATAATC 7500 

GAATTGTCTC CAGTGATTCT CGTCTGTGGG GTGCCATCGA CGGTAGTACG CAGATAAAAG 7560 

CAAAGGCATT CATGCGTTAT TTCCCTTTCG GAGCATTTGG TGTCTTGTAG TGTGTAGGCG 7620 

CCGCATTTGT GGTGCGTGtG CGCATCGTGC TGTTCCTTTT ATCATGTCTT CTGAGGTCGG 7680 

TGCGTCTTTG TACX3TGCACA TCCCCTTCTG TGCGCAACGC TGTGCTTACT GCGATTTTTA 7740 

CTCCCTGGTG CGTTCAACCT ATTTTAGGCC TCATCAGCCT TGTCCGCATT TTATCGATCG 7800 

GCTGCTACAG GATGTGGCAT TGCAGCGGGA GTGCTTTGGG GTCCAGGGkT GGCAGACAGT 7860 

GTATATGGGT GGAGgTACCC CTTCGCTATT GGCACCGCAG GACATTCGTC ATTTTTGCGT 7920 

AGCGTTACGC GCCGCGCAGC GGTATCCGAT TCAGGAGTTC ACTCTTGAGG TGAATCCTGA 7980 

GGATGTGACC GAAGAGTTTT TGTGTGCGTG TGCAGAAGGC GGAGTAAACC GTTTATCCCT 8040 

TGGGGTACAA AGTCTGCGTG ATGAGGTGTT GCGTGCGGAG CGTCGTGCAG CCTCTGCTGA 8100 

ATGTGCTCGT ACCCGcTCCG CGTGATGACG GCAAATGCGC GCTTTTTCTC TGGCOGGGT6 8160 

CGTATTTCAG CAGATCTCAT CGCTGGATTG CGCGGGCAAA CGGCGCGAAT GGTGCGTGAG 8220 

GATaTAGATG AGCTTTTGTC TTTTGGGCTG AGACACGTGT CGGTATATGG GTTGTGTGTA 8280 

CCGCATCCGA CTGAAACGCA AGAGGAGCGA ATT6CAGCGC TTTGGGCACA CGGCAGCGCX3 8340 

TATCTGGTGC GTGCaGGATT TAACCGGTAT GAGCTTTCGA ATTTTGCACG TACTGC9GCG 8400 

GACGAGA6CG CGCACAACAG AGCATATTG6 CGGATGGCAC CGCACGCAGG GGTGGGGCCT 8460 

GGCGCAGTTG GCACGCGTTT TGTCAACCTT TCTTTATCAA AGGAGGGGGC GTGGGCGATC 8520 

CGCAGCACGG TGCGGAAACA TCTTGGCCAA TACTTAGCAG AAGTGTGTCG GGAAAATGTG 8580 

TATGAGCACG AATTCCTTAC AGAACATATG T6TGTGCAAG AAGCATTGTT AATGGGATTA 8640 

CGTCTTGAAC AGGGACTGGA TGTGGTTACA TTTCGTGCGC GGTTCGGGAA GGGAATTCAA 8700 

GCGTACATTG GCAAAACAAT CGCGCGGTGG CAGTGTCATG GCCGAATGCA GCGGACGGCG 8760 

ACGTCATTGC GTTTGAGTGC GCAGgCACGG GTATTTCTGG ACAGTTTTTT GCGAGAGGCG 8820 
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